genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

HG001_bam files information #29

Open poddarharsh15 opened 9 months ago

poddarharsh15 commented 9 months ago

Hello, I'm attempting to convert the HG001 BAM files to FASTQ format for benchmarking purposes. However, I'm facing challenges aligning the generated FASTQ files to my reference genome. Could you provide information on the reference genome used to create these BAM files? Screenshot from 2023-12-13 10-01-13

ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/RMNISTHS_30xdownsample.bam

nate-d-olson commented 9 months ago

Thanks for using our GIAB data. The alignment was generated using GRCh37, I am not the specific version of 37. Though probably not hg19, since the chromosome names in the header do not include "chr". You can also find fastq files in the subdirectories https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/{131219_D00360_006_AH81VLADXX,131223_D00360_007_BH88WKADXX,131223_D00360_008_AH88U0ADXX,140115_D00360_0009_AH8962ADXX,140115_D00360_0010_BH894YADXX,140127_D00360_0011_AHGV6ADXX,140127_D00360_0012_BH8GVUADXX,140207_D00360_0013_AH8G92ADXX,140313_D00360_0014_AH8GGVADXX,140313_D00360_0015_BH9258ADXX,140407_D00360_0016_AH948VADXX,140407_D00360_0017_BH947YADXX}/ . You can find the full list of fastq files here, https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/NA12878/sequence.index.NA12878_Illumina300X_wgs_09252015_updated. Each folder contains about 20 - 30X sequencing data. Hope this helps. Let us know if you run into any additional issues or have other questions.