genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
236 stars 71 forks source link

Bed files information of WES data for AshkenazimTrio [HG002,HG003,HG004] #33

Open poddarharsh15 opened 3 months ago

poddarharsh15 commented 3 months ago

Hi, Could you please point me to where I can find the BED files for whole exome data for AshkenazimTrio? [HG002,HG003,HG004] I would like to run some tests. Thank you so much for your assistance.

Best regards, HP.

nate-d-olson commented 2 months ago

Hi @poddarharsh15, I am not sure what you are looking for. We don't have exome bed files for our GIAB benchmarks. To benchmark WES-based small variant calls with hap.py, you would want to use the benchmark bed as the truth region and the WES regions bed file as the target regions. The WES region bed file will be assay-specific. To benchmark SVs with the Truvari bench, you will want to use the intersection of the benchmark bed and WES regions bed as --included.

If you are looking for WES data, we only have a few older datasets. The Google Health Team has some more recent WES datasets you should consider using: https://doi.org/10.1101/2020.12.11.422022.

For a bed file with all coding regions we have stratification bed files with refseq CDS, e.g. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v3.5/GRCh38@all/Functional/.

Hope this helps.

Best! Nate