bcgsc / NanoSim

Nanopore sequence read simulator
Other
233 stars 56 forks source link

min_len and max_len options. #160

Open XavierGrand opened 2 years ago

XavierGrand commented 2 years ago

Hi NanoSim Team,

I'm simulating reads in transcriptome mode, and I want to restrict the minimum and maximum length of simulated reads with "--min_len 3400 --max_len 3450" options. But, It seems to do not work properly...

My command line: simulator.py transcriptome -rt transcripts.fasta -rg genome.fasta -e expression_profile.tsv -c Profile -o Training/Simulated -n 10000 --min_len 3400 --max_len 3450 --fastq -b guppy -r dRNA --no_model_ir -k 0 -t 4

The statistics of simulated reads in fastq file are : file format type num_seqs sum_len min_len avg_len max_len
Simulated__aligned_reads.fastq FASTQ DNA 6096 13094394 41 2148.0 5293

Any idea ? Thanks !

conda-list.log

kmnip commented 2 years ago

As far as I can tell from the code, the values for options --min_len and --max_len are not used at all in the transcriptome mode (unlike genome and metagenome modes). https://github.com/bcgsc/NanoSim/blob/fc5a67b46c1fb4ba055b1f33514a7818e790f585/src/simulator.py#L1529-L1531 Not sure whether this is the intended behavior. @SaberHQ can confirm. If so, then these options should be removed.

SaberHQ commented 2 years ago

Hi @XavierGrand,

As @kmnip correctly noted, --min_len and --max_len arguments are not used when simulating aligned transcriptome reads. There are however used in simulating unaligned read sets.

Due to the nature of the way NanoSim works in transcriptome mode, we decided not to use min and max length arguments for simulating aligned transcriptome reads. Please note that in transcriptome mode, NanoSim relies on expression profiles to select a reference transcript to simulate reads from and then based on read length distribution, it selects a read length given a reference transcript and finally applies the error models to produce the synthetic read.

That being said, I am going to add a label to this issue. We probably need to perform some analysis to determine whether to add this feature or not. I can not guarantee that it will be part of the next release or not. I will keep you updated on this.

Best, Saber.