Open Benni96 opened 7 years ago
Were your input reads all of uniform read length? I'm surprised by this behavior; would you be willing to provide some data with which I can reproduce the issue?
Thank you!
Hi, The read length varies a bit (+/- 2 bp) as you also see in the post before. I also observed this phenomenom in other datasets at low level. Accidentely, I tried the UMI generation with the option "-n 5" and then the consensus reads were correct. However, I have no clue why this option does change the output. Do you?
BMFtools assumes uniform read length, which is why adapter masking, not trimming, is suggested.
Are you using Illumina data?
-n only changes memory requirements.
How many reads passed homing sequence?
Hi, I collapsed amplicon data and got some truncated reads during the collapse step.
The data was paired-ed reads which were stiched to single reads. The stiched reads were collapsed with the UMI being inline.
bmftools collapse inline -S -l 10 -s <homing> -f <prefix> -z <stiched reads>
After mapping I observed some reads which did not span the entire amplicon region. I checked back the read in the UMI file and in the stiched reads file. The "original" stiched read file contained 12900 reads with the UMI and 99.9% were full length and only 10 were smaller. However, the smallest read was still longer than the read in the UMI read file.
UMI: GCATCCACAAAT Stiched reads with this UMI: 12963 reads length distribution (count / length): 1 96
1 129 1 130 10 131 161 132 12787 133 2 134 length of the consensus of the UMI family: 69 bp The homing sequence is 3 nt and the barcode 10nt. Therefore, even if the 96nt should result in a consensus read.
Do you have any suggestions? Or was this observed before?