Low truvari benchmark scores

riyasj327 commented 6 months ago

Hi,

I am trying to run truvari on the hapdiff unphased variants VCF (produced by the haplotype resolved HG002 assemblies) against the HG002 benchmarking VCF. I have given the commands used and the links to the public datasets below:

Hapdiff command: singularity exec --bind $DD_DIR hapdiff_0.9.sif hapdiff.py --reference $DD_DIR/chm13_v2.fa --pat $DD_DIR/hg002v1.0.1.pat.fasta.gz --mat $DD_DIR/hg002v1.0.1.mat.fasta.gz --out-dir $DD_DIR/hapdiff -t 20

Links to the pat and mat assemblies: hg002v1.0.1.pat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.pat.fasta.gz hg002v1.0.1.mat.fasta.gz - https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/hg002v1.0.1.mat.fasta.gz

Truvari command: truvari bench -b CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz -c /projects/rsaju_prj/LongReadAssembly-test/hapdiff/hapdiff/hapdiff_unphased.vcf.gz -o output/

Links to the base dataset: base dataset - https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.015-20240215/CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz comparison dataset - produced by the hapdiff using the command above

Unfortunately, the precision, recall and F1 scores are low(~0.5) when it should be around 0.9? I tried using the latest HG002 benchmarks and good quality HG002 haplotype resolved assemblies available. Please find the summary.json produced by the truvari attached with this issue. summary.json

Any idea what is going on and why are the scores so low? Any insights on this would be really helpful!

Thanks, Riya

mikolmogorov commented 6 months ago

Hello,

Sorry for the late response! I don't see anything wrong with your command lines, so it is hard to tell why the scores are lower. I would try to look into fp / fn calls output by truvari and see if you can see any patterns there. If can look into the hapdup assembly alignments in these regions along with the hapdiff VCF calls.

oneillkza commented 6 months ago

Thanks @fenderglass !

Yep when we looked into the FP/FN calls, it looked like a lot of those were due to tandem repeat arrays just being represented slightly differently. (This may in part be due to actual errors in the gold standard, since it's a newer one based off the HPRC T2T assemblies, which hasn't been as well curated as the old hg19 gold standard).

Anyway, the solution seemed to be to slightly relax some of the parameters in Truvari. @riyasj327, could you please post the full Truvari call you ended up using?

(The numbers we are now getting show F-measures around 0.8 for sniffles2, hapdiff and pav, with slight differences in FP/FN tradeoff between the methods).

mikolmogorov commented 6 months ago

Good to know - 0.8 seems closer to the expected number, if you are including SDs / pericentromeric reigons.

riyasj327 commented 6 months ago

Thanks @fenderglass and @oneillkza!

Here is the final truvari commands we are using now:

truvari bench -b CHM13v2.0_HG2-T2TQ100-V1.0.vcf.gz -c hapdiff_unphased.vcf.gz -o truvari --pctseq 0.5 --pctsize 0.5 -r 2000 --chunksize 2000 --passonly --includebed CHM13v2.0_HG2-T2TQ100-V1.0_stvar.benchmark.bed

truvari refine --regions truvari/candidate.refine.bed --reference chm13_v2.fa --recount --use-region-coords truvari/

The new Truvari scores: "TP-base": 18421, "TP-comp": 18421, "FP": 726, "FN": 7969, "precision": 0.9620828328197629, "recall": 0.6980295566502464, "f1": 0.8090563717416607, "base cnt": 26390, "comp cnt": 19147

mikolmogorov commented 5 months ago

Thanks for the info!

KolmogorovLab / hapdiff

Low truvari benchmark scores #3