Open joyceFunk123 opened 7 months ago
Hey @joyceFunk123
Thanks for your interest in using NanoSim. NanoSim uses the quantification file to choose transcripts to simulate reads from. It is an approximation estimation and does not reflect the same number of reads.
That being said, in our analysis we showed that there is a very high correlation between the estimated transcript abundance of the empirical dataset and the simulated dataset generated by Trans-NanoSim, indicating that the observed raw transcript expression level is well replicated by Trans-NanoSim (Figure 1.C in Trans-NanoSim paper). I highly recommend you take a look at it.
The feature you asked for should be also interesting to implement and it has been requested before. However, I am not sure if I will get some time to implement that into NanoSim in future releases.
Currently, NanoSim takes the -n
option as input which reflects the number of reads to be simulated. It first selects a transcript based on the expression profile and then simulates a sequence out of it based on read profiles.
Considering that those expression levels are reported in TPM, you may generate 1 million reads to have a similar number of reads generated from a transcript. It should be the closest approximation, otherwise, for the exact number of reads, I have to implement an option to only rely on expression profile.
Hello,
I am simulating reads in transcriptome mode. For this purpose, I have created an expression profile in the specified format. The read counts generated per transcript are different from what I would expect based on the expression profile. Is there a way to get the read counts per transcript in the same proportion as the specified tpm values? Or is there a way to give the exact number of reads per transcript? Also, I'm wondering how the given expression profile values are processed for the changes to occur.
Thanks a lot! Joyce