DiltheyLab / MetaMaps

Long-read metagenomic analysis
Other
96 stars 23 forks source link

Percentage identity and mapping quality question #57

Closed maguileraf closed 12 months ago

maguileraf commented 3 years ago

Hi, I ran meta maps with a percentage identity of 98 but looking at classification_results.EM I see entries that are below this percentage identity and I am not sure if I am interpreting the results right. Also, there is a mapping quality value in this file but some are 1 and some are very low so could you go more in depth about what this mapping quality represents?

molbio7 commented 2 years ago

Bumping this thread to get some advice on what has been asked here.

AlexanderDilthey commented 12 months ago

Hey @molbio7,

Thank you for your question!

The --perc_identity controls the density at which minimizers are selected from the reference genomes; it is not the case, however, the individual mapping locations that exhibit an estimated mapping identity lower than the specified --perc_identity are filtered out. You can think of --perc_identity as a lower threshold on the identities of alignments you want to capture with a high probability.

The mapping qualities, by contrast, represent a probability distribution over individual mapping locations for each read. Say you have four potential read mapping locations which are identical with respect to all quality metrics - the mapping quality will then be 0.25 for each location, independent of whether the estimated identities of the four potential locations are 0.99 or 0.85.

Best wishes

Alex