Open XavierGrand opened 2 years ago
As far as I can tell from the code, the values for options --min_len
and --max_len
are not used at all in the transcriptome mode (unlike genome and metagenome modes).
https://github.com/bcgsc/NanoSim/blob/fc5a67b46c1fb4ba055b1f33514a7818e790f585/src/simulator.py#L1529-L1531
Not sure whether this is the intended behavior. @SaberHQ can confirm. If so, then these options should be removed.
Hi @XavierGrand,
As @kmnip correctly noted, --min_len
and --max_len
arguments are not used when simulating aligned
transcriptome reads. There are however used in simulating unaligned
read sets.
Due to the nature of the way NanoSim works in transcriptome
mode, we decided not to use min and max length arguments for simulating aligned transcriptome reads. Please note that in transcriptome
mode, NanoSim relies on expression profiles to select a reference transcript to simulate reads from and then based on read length distribution, it selects a read length given a reference transcript and finally applies the error models to produce the synthetic read.
That being said, I am going to add a label to this issue. We probably need to perform some analysis to determine whether to add this feature or not. I can not guarantee that it will be part of the next release or not. I will keep you updated on this.
Best, Saber.
Hi NanoSim Team,
I'm simulating reads in transcriptome mode, and I want to restrict the minimum and maximum length of simulated reads with "--min_len 3400 --max_len 3450" options. But, It seems to do not work properly...
My command line: simulator.py transcriptome -rt transcripts.fasta -rg genome.fasta -e expression_profile.tsv -c Profile -o Training/Simulated -n 10000 --min_len 3400 --max_len 3450 --fastq -b guppy -r dRNA --no_model_ir -k 0 -t 4
Any idea ? Thanks !
conda-list.log