HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

abundance.txt when simulating from a single input genome #196

Closed abhijna closed 3 years ago

abhijna commented 3 years ago

This is a great tool with a lot of good documentation. Thanks for developing it!

From the documentation, it sounds like the abundance.txt file is only relevant in the context of multiple input genomes (metagenome simulation). I'm simulating fastq files from a single input human genome. This is my code:

iss generate --genomes GRCh38_latest_genomic.fna --model hiseq --n_reads 5M --cpus 4 --output my_sim

It works and I get an abundace.txt file that looks like this:

NC_000001.11    0.0012148574648829446
NT_187361.1 0.00039399817122000503
NT_187362.1 0.000728265636780273
NT_187363.1 0.0005431097202540986
NT_187364.1 0.0013880611439478039
NT_187365.1 0.0029288720312758815
NT_187366.1 0.0014362202821329648
NT_187367.1 0.010061731406034249
NT_187368.1 0.00034613437186046
NT_187369.1 0.002001274039284856
NC_000002.12    0.0007292370931194184
NT_187370.1 0.0005869550237639334
NT_187371.1 0.0007737203866125622
NC_000003.12    7.577908977177823e-05
NT_167215.1 0.0033212828934988314
NC_000004.12    0.0005150685239650543

What do NC_000001.11, NT_187361.1, etc mean? And what are these proportions? This is also output on my terminal when I run the command.

HadrienG commented 3 years ago

Hi!

I'm assuming your input genome consists of several contigs? If so you need the use the --draft option instead of --genomes.

Best, Hadrien