Questions about the usage and processing of the expression profile

Hey @joyceFunk123

Thanks for your interest in using NanoSim. NanoSim uses the quantification file to choose transcripts to simulate reads from. It is an approximation estimation and does not reflect the same number of reads.

That being said, in our analysis we showed that there is a very high correlation between the estimated transcript abundance of the empirical dataset and the simulated dataset generated by Trans-NanoSim, indicating that the observed raw transcript expression level is well replicated by Trans-NanoSim (Figure 1.C in Trans-NanoSim paper). I highly recommend you take a look at it.

The feature you asked for should be also interesting to implement and it has been requested before. However, I am not sure if I will get some time to implement that into NanoSim in future releases.

Currently, NanoSim takes the -n option as input which reflects the number of reads to be simulated. It first selects a transcript based on the expression profile and then simulates a sequence out of it based on read profiles.

Considering that those expression levels are reported in TPM, you may generate 1 million reads to have a similar number of reads generated from a transcript. It should be the closest approximation, otherwise, for the exact number of reads, I have to implement an option to only rely on expression profile.

bcgsc / NanoSim

Questions about the usage and processing of the expression profile #207