genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

raw data between 30X and 40X and their truth VCF #14

Closed Talo07 closed 2 years ago

Talo07 commented 3 years ago

Dear GIAB team,

I just discovered these wonderful data (singleton and trio) to evaluate my variant call pipeline.

Maybe a stupid request, would it be possible to send me a raw data link (ilumina fastq) with coverage between 30X and 40X and their truth vcf results done on hg38. My human genome WGS samples having a coverage between 30x and 40x that's why I am requesting the raw data from this coverage. I also wanted to know if I was using the reference hg38.fasta from UCSC will not affect the comparison with vcf truth.

I would be very grateful and Thanks in advance !

jzook commented 3 years ago

The easiest place to get 35x fastq files for HG002/HG003/HG004 may be from the recent precisionFDA challenge data under https://doi.org/10.18434/mds2-2336. This challenge and descriptions of comparisons done to the benchmark are described in https://doi.org/10.1101/2020.11.13.380741. The latest v4.2.1 GRCh38 benchmarks ("truth" vcfs and beds) for this trio are under ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/AshkenazimTrio, described in https://doi.org/10.1101/2020.07.24.212712%20. Using hg38.fasta from UCSC should be fine. Cheers!

Talo07 commented 3 years ago

Thank you for your answer @jzook , it is very useful for me. I have singleton samples and I would like to know if you have any idea to take the fastq of HG001 (30X-40X) .

Otherwise, can I only use HG002 to use it as a singleton because as you know it is among the trio?

jzook commented 2 years ago

Hi @Talo07 - we have now released v4.2.1 benchmarks for HG001-HG007 in https://doi.org/10.1101/2020.07.24.212712%20. If you want more singletons, you could use the unrelated parents (HG003/4/6/7) and HG001. You may want to look at the datasets recently released as part of this manuscript for 30-40x illumina fastqs: https://doi.org/10.1101/2020.12.11.422022