Closed BCArg closed 4 years ago
Thank you for your question. From your description it sounds like reads further down the file are causing this behaviour. The best strategy would be to map all your reads to a close reference, then look for purA and eyeball the pileup (particularly at the start of the gene) to see if you have a minority variant in there, or some other structural variant.
Hi Andrew,
from what I understood in the methods section of your paper, krocus uses an iterative process to identify the alleles from k-mers from the raw reads. I am now struggling a bit to understand what each line in the output file means.
While following the generation of the output file with
tail -f <out.txt>
I noticed that all of the alleles had a partial k-mer coverage (marked with a star*) by the beggining of the analysis and also that the percentage of k-mers covered by reads was always going up. Seeing this as an iterative process, where more and more k-mers from the reads are matched against the allele k-mers this makes sense. I noticed this for my first analyses, so that I was always looking at thetail
of the output file in order to check which allele was identified.Then today I noticed that for a sample that we sequenced
678 100.00 recA(7) fumC(6) gyrB(5) mdh(9) purA(7) adk(6) icd(136)
However, by the end of the file, I noticed that another allele of purA was not identified with 100% k-mer coverage, shown below;
ND 100.00 recA(7) fumC(6) gyrB(5) mdh(9) purA(489) adk(6) icd(136)
I then changed the
kmer
parameter, varying from 7 to 17, but the pattern described above (no match/ match/ no match) was observed for allkmer
tried.I was now wondering what would be the correct way of interpreting the output files.
Thanks in advance for reading this long question