Illumina / hap.py

Haplotype VCF comparison tools
Other
402 stars 122 forks source link

Calculation of METRIC.Precision #157

Open b-math opened 2 years ago

b-math commented 2 years ago

Dear hap.py development team,

Could you please tell me, how the calculation for METRIC.Precision is done for the hap.py statistics summary?

According to the docs, the following formula is used: Precision = TP/(TP+FP)

So I tried to calculate the Precision with TRUTH.TP/(TRUTH.TP+QUERY.FP), however I get different results compared to the METRIC.Precision. Is this behaviour expected? And if yes, could you please provide me with the formula (or correct column names) to get METRIC.Precision correctly?

See below the examples from your git repo. The behaviour is observed for

vcfeval...

    Type    Filter  TRUTH.TOTAL TRUTH.TP    TRUTH.FN    QUERY.TOTAL QUERY.FP    METRIC.Recall   **METRIC.Precision**    METRIC.F1_Score **TRUTH.TP/(TRUTH.TP+QUERY.FP)**
0   INDEL   ALL 8929    7968    961 11812   227 0.892373    **0.972637**    0.930778    **0.9723001830384381**
1   INDEL   PASS    8929    7660    1269    9971    175 0.857879    **0.978155**    0.914077    **0.9776643267389917**
2   SNP ALL 52494   52174   320 90092   504 0.993904    **0.990444**    0.992171    **0.9904324385891644**
3   SNP PASS    52494   46955   5539    48078   90  0.894483    **0.998089**    0.94345 **0.9980869380380487**

... and happy

    Type    Filter  TRUTH.TOTAL TRUTH.TP    TRUTH.FN    QUERY.TOTAL QUERY.FP    METRIC.Recall   **METRIC.Precision**    METRIC.F1_Score **TRUTH.TP/(TRUTH.TP+QUERY.FP)**
0   INDEL   ALL 8937    7839    1098    11812   343 0.87714 **0.958635**    0.916079    **0.9580787093620142**
1   INDEL   PASS    8937    7550    1387    9971    283 0.844803    **0.964656**    0.90076 **0.9638708030128942**
2   SNP ALL 52494   52125   369 90092   582 0.992971    **0.988966**    0.990964    **0.9889578234390118**
3   SNP PASS    52494   46920   5574    48078   143 0.893816    **0.996963**    0.942576    **0.9969615196651297**

... but not for [unhappy]()

    Type    Filter  TRUTH.TOTAL TRUTH.TP    TRUTH.FN    QUERY.TOTAL QUERY.FP    METRIC.Recall   **METRIC.Precision**    METRIC.F1_Score **TRUTH.TP/(TRUTH.TP+QUERY.FP)**
0   INDEL   ALL 8937    7060    1877    11812   1232    0.789974    **0.851423**    0.819548    **0.8514230583695128**
1   INDEL   PASS    8937    6850    2087    9971    1157    0.766476    **0.855501**    0.808546    **0.8555014362432871**
2   SNP ALL 52494   52105   389 90092   639 0.99259 **0.987885**    0.990232    **0.9878848779008039**
3   SNP PASS    52494   46908   5586    48078   178 0.893588    **0.99622** 0.942117    **0.9962196831329907**

Thank you very much for your time and I'm looking forward to hearing from you soon.

Best regards Barbara

skDooley commented 10 months ago

I am seeing the same thing. I tried playing with the numbers but it just doesn't add up...