Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
25 stars 8 forks source link

resistome analyzer, coverage threshold, how are mismatches handled? #20

Closed jessmewald closed 1 year ago

jessmewald commented 1 year ago

I'm looking for clarification on the resistome analyzer portion of the pipeline. I'd like to know how mismatches in the alignment are handled in relation to the coverage threshold set by the user and gene fraction reported by the tool.

It seems that the gene fraction, when reported as 100 percent, can contain mismatched bases. Is there a limit to the number of mismatches that an alignment can contain and still meet the 100 percent gene fraction? Any further explanation would be useful. Thanks!

EnriqueDoster commented 1 year ago

Hello @jessmewald,

The alignment calls are made by bwa and it's default alignment scores. Then, the resistome analyzer portion of the pipeline takes that alignment file and applies the coverage threshold for each gene accession with alignments.

Your question is interesting and I don't have an accurate answer, but it could be tested by modifying the bwa alignment score thresholds, but you'd also have to consider a few other things. One major thing consideration is that bwa handles alignments with identical scores by "flipping a coin" and randomly selecting the alignment call. This could potentially be an issue with gene accessions like blaTEM which can have variants with only a few SNP differences. We balance this by using the 80% gene fraction and performing some of our statistical analyses on counts aggregated to the group level instead of counts to individual gene accessions. We're looking for ways to improve this too, so definitely let us know if you have any suggestions.

I'll close this issue for now, but feel free to open another issue if you have more questions or suggestions. Thanks!