ROC and PR curve - Githubissues

ESDeutekom commented 1 year ago

Dear @pkrusche and team,

I have done an analysis with Deepvariant variants called from a genome in a bottle sample. I did the analysis with hap.py with a singularity pulled docker taken from docker://pkrusche/hap.py on the benchmark giab vcf.

I used the following command (in snakemake rule):

shell: "export HGREF={input.ref_genome}; /opt/hap.py/bin/hap.py {input.truth_vcf} {input.query_vcf} --false-positives {input.confidence_bed} --target-regions {input.target_bed} -r {input.ref_genome} --roc QUAL --roc-filter RefCall -o {params.prefix} -V --engine=vcfeval --engine-vcfeval-template {input.ref_sdf} --threads {threads} --logfile {log}"

I am however confused by the results. I added the option --roc, because this is the only option I could find (not a pr curve option?). However, I found in the documents that precision and recall are calculated, this is also what I see as column names in the output (see first two rows and header) and not roc metrics:

Type | Subtype | Subset | Filter | Genotype | QQ.Field | QQ | METRIC.Recall | METRIC.Precision | ... INDEL | | TS_contained | ALL | | QUAL | 65.300003 | 0.0 | 1.0 | ... INDEL | | TS_contained | SEL | | QUAL | 65.300003 | 0.0 | 1.0 | ...

How is it possible to have a Recall of 0 and Precision of 1? Unless this is just wrongly labelled metrics and should be TPR and FPR and it is supposed to be a ROC plot? Like the flag says. The plot I made also looks like it should be a ROC.

Additionally, if I plot the METRIC.Recall and METRIC.Precision from the roc files, I get a plot that follows a typical ROC form, while if I plot the values as also calculated in happy.md, I get a different plot and one that does look more like a PR curve: Recall = TRUTH.TP / (TRUTH.TP + TRUTH.FN) Precision = QUERY.TP / (QUERY.TP + QUERY.FP)

Thank you in advance, Eva

ivargr commented 1 year ago

+1 I'm also confused about the same.

ryancey1 commented 11 months ago

+2 Also confused

Illumina / hap.py

ROC and PR curve #179