genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

Request for Information on Original VCF for Lifted Over VCF #17

Closed yueyaog closed 1 year ago

yueyaog commented 1 year ago

Hi GIAB folks!

I am currently working on a project and I have been using the NIST_SVs_Integration_v0.6 truth set (https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/).

In order to ensure that I am using the correct HG38 version of this truth set, I have been searching for information on the original variant call format (VCF) used in the lifted over VCF file available at the URL https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/vcf/nstd175.GRCh38. Unfortunately, I was not able to find enough information on this in the README.md.

I would greatly appreciate it if you could provide any information you may have on the original VCF used in the lifted over VCF file.

Thanks, Gao

jzook commented 1 year ago

Hi Gao,

THe vcf lifted over were the PASS calls in https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NIST_SV_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz. Note that the GRCh38 lifted variants are not as reliable as a benchmark. For GRCh38, we are currently developing a new assembly-based benchmark, but currently our best benchmark is for a subset of challenging genes at https://rdcu.be/cGwVA and https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/

Cheers, Justin

On Wed, Feb 8, 2023 at 4:46 PM Yueyao Gao @.***> wrote:

Hi GIAB folks!

I am currently working on a project and I have been using the NIST_SVs_Integration_v0.6 truth set ( https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/ ).

In order to ensure that I am using the correct HG38 version of this truth set, I have been searching for information on the original variant call format (VCF) used in the lifted over VCF file available at the URL https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/vcf/nstd175.GRCh38. Unfortunately, I was not able to find enough information on this in the README.md.

I would greatly appreciate it if you could provide any information you may have on the original VCF used in the lifted over VCF file.

Thanks, Gao

— Reply to this email directly, view it on GitHub https://github.com/genome-in-a-bottle/giab_data_indexes/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASTU5SU3O2J3ZEDDRXB5JDWWQH2FANCNFSM6AAAAAAUVZCMTE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yueyaog commented 1 year ago

Hi Justin, Thank you so much! Those are super helpful for our project!!!