bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

simulate ONT full length transcriptome reads #154

Closed moold closed 2 years ago

moold commented 2 years ago

Hi, thank you for this great tool, and I wonder is there any option or solution to simulate ONT full-length transcriptome reads only?

SaberHQ commented 2 years ago

Hi @moold . Thanks for using this tool.

NanoSim runs in three modes, namely genome, metagenome, and transcriptome. In order to simulate ONT cDNA or directRNA reads, you should run the tool on transcriptome mode.

Please note that NanoSim learns the length distribution of ONT transcriptome reads in training stage, and later in the simulation phase, it uses those learned profiles to simulate transcriptome reads. More specifically, in the simulation phase it first picks a transcript to simulate reads from based on expression profiles and then based on learned length distribution of reads, it extracts part of that transcript for read simulation. That being said, the resulted simulated reads, may contain reads that cover the entire length of a transcript or partially covers it.

If by "full-length" you want all simulated reads to fully cover transcripts they are derived from, unfortunately there is no such option in NanoSim. If you need such an option for a specific usecase in your analysis, what I would suggest is to simulate bunch of ONT transcriptome reads and then by aligning those simulated reads to the reference transcriptome, you can filter reads that cover the full-length of each transcript and use them for your analysis.

Hope that answers your question. Please let me know if you have any more questions. Cheers.

moold commented 2 years ago

Hi, thanks for your explanation.