PacificBiosciences / pb-human-wgs-workflow-snakemake

DEPRECATED - Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads
BSD 3-Clause Clear License
38 stars 19 forks source link

Structural variants benchmark dataset #158

Closed priyambial123 closed 1 year ago

priyambial123 commented 1 year ago

Hello

Can you provide a link for benchmark HG002 structural variant vcf dataset for sequel 2 system and called using GRCh38 reference build. I have HG002 data for sequel 2 and using pbsv tool I have called the structural variants. I want to see if the tool could discover the same number of structural variants as in benchmark dataset. It would be helpful if you can share the link

Thank you

juniper-lake commented 1 year ago

The HG002/NA24385 structural variant benchmark most frequently used for this type of work was called using the GRCh37 reference genome and produced by Genome in a Bottle. For GRCh38 work, it would need to be lifted over from GRCh37 to GRCh38. However, as described here, lifting over variants from one genome to another is not very robust.

You are welcome to download GIAB's structural variant benchmark for HG002/NA24385 from the paper linked above and lift it over to GRCh38 or, alternatively, use other less-popular benchmarks with variants called natively from GRCh38, such those described by Ebert et al. and Chin et al.

juniper-lake commented 1 year ago

We'll have somebody reach out to provide our internal benchmarking resources. All follow-up questions can be addressed to that person. Hope this helps!

priyambial123 commented 1 year ago

Thank you

equinne5 commented 1 year ago

Hi , was just wondering if there as been any update on a SV benchmark dataset for GRCh38 ? thanks!

williamrowell commented 1 year ago

Hello, there's no official GRCh38 GIAB SV benchmark set. Justin Zook describes some alternatives in https://github.com/genome-in-a-bottle/giab_latest_release/issues/9.