Open mhuang00 opened 3 years ago
Hi @mhuang00,
Q1: First of all, the number in the middle, for example 14611
, 1127
, and 7966
are the aligned bases on the reference genome, not on the simulated sequence, so it is a bit off. Second, there are gaps between segments in chimeric reads, and the lengths of the gaps are not reflected in the header, so the length for the chimeric read deviates further in your calculation.
Q2: 5 is the unaligned head region of the chimeric reads, so the aligned part of the first segment starts at position 6 (1-indexed).
It seems you are not using the latest version of NanoSim. The output of the latest version provides information about the gap sizes as well.
Feel free to contact if you find anything unclear.
Cheers, Chen
Hello,
I am using the chimeric read simulation function, and would like to use the position of chimeric region introduced by the simulator. However, I can't make sense of the header, specifically the number of bases in different regions. They don't seem to match the calculated sequence length as well.
For example, in the example given - the sequence length is 3236.
In this examples I've picked out, their calculated sequence lengths are 14157 and 11244 respectively. I can't seem to sum them up to their sequence length.
Thanks!