Mykrobe-tools / mykrobe

Antibiotic resistance prediction in minutes
MIT License
106 stars 27 forks source link

Update dynamic genotype conf threshold calculation #60

Closed iqbal-lab closed 2 years ago

iqbal-lab commented 5 years ago

Looking at a very high coverage nanopore sample, we have found a good example where there is a very high confidence call, in agreement with phenotype, but which is below the dynamically calculated threshold.

INFO:mykrobe.cmds.amr:Confidence cutoff (using percent cutoff 90%): 12033

and the variant had confidence 11957.

The relevant bit of the JSON is

"rpoB_H445X-CAC761139GAC":{  
   "variant":"ref-H445X?var_name=CAC761139GAC&num_alts=3&ref=NC_000962.3&enum=0&gene=rpoB&mut=H445X",
   "genotype":[  
      1,
      1
   ],
   "genotype_likelihoods":[  
      -12045.524099907143,
      -88.976184884562
   ],
   "info":{  
      "coverage":{  
         "reference":{  
            "percent_coverage":5.0,
            "median_depth":0,
            "min_non_zero_depth":1,
            "kmer_count":1,
            "klen":21
         },
         "alternate":{  
            "percent_coverage":100.0,
            "median_depth":63,
            "min_non_zero_depth":53,
            "kmer_count":1314,
            "klen":21
         }
      },

So what is going on? At this high depth, the whole genotype confidence distribution is shifted far to the right, and the default threshold which is set at keeping 90% of samples, is too strict. at this depth, almost everything is going to be good.

What's the fix? well, i guess the bloody percentile threshold we choose (ie the 90%), should go up with depth

mbhall88 commented 5 years ago

If the percentile threshold goes up with depth wouldn't that mean that this variant would be further away from the confidence cut-off?

iqbal-lab commented 5 years ago

no the number going up is the area under the curve. we're saying accept any genotype confidence in the top x% of the area under the curve. currently we accept any genotype confidence in the top 90%, and i'm saying for high coverage we could accept anything in the top 99% or something

mbhall88 commented 3 years ago

This is likely related to #121

mbhall88 commented 2 years ago

Closing in favour of #121