Closed AlphaSquad closed 3 years ago
Hi @AlphaSquad,
In NanoSim, we first decide an original middle length on the reference, however the length is changed a bit after introducing errors to ensure the beginning and end of an aligned sequence can be matched. In previous versions of NanoSim we reported the adjusted length, but since v3.0.0, we reported the original middle_ref length instead. This is not an expected behaviour, so I just fixed it and made a new commit. Thanks for reporting it!
Chen
Thank you very much, the lengths are now as expected!
Hi, when trying to create mapping files/CIGARs from NanoSim output I encountered a problem with the sequence lengths, somewhat similar to #119, but I think it is a separate problem, that is why I am opening this separate Issue.
Using the command
simulator.py genome -n 4050 -rg GCA_001900715.1_ASM190071v1.fa -o test/Genome2.0 -c training --seed 1246114144 -dna_type linear
using this error model, I get minor discrepancies in the expected vs simulated read lengths:The first read of this simulation has a total length of insertions of 202 and a total length of deletions of 328, which I calculated using the
_error_profile
file (length of the insertions/deletions in the fourth column):head -n 528 test/Genome2.0_aligned_error_profile | grep "ins" | cut -f 4 | paste -sd+ | bc
andhead -n 528 test/Genome2.0_aligned_error_profile | grep "del" | cut -f 4 | paste -sd+ | bc
This read has no unaligned bases in the end or start (read name is>CP010152_2172479;aligned_0_R_0_6839_0
) and the aligned bases of the reference is6839
. Putting things together, I would expect the read to be6839 (aligned length of reference) + 202 (insertions length) - 328 (deletions length) = 6713
bases long, but the read has length6710
. Do I have a a faulty reasoning for this length calculation? If yes, could you point out how I would end up with the correct length of the read using the_error_profile
?I attached the reference genome (so you can re-run the exact command) and also the output of NanoSim: NanoSim_test.zip
Thank you very much in advance