genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
236 stars 71 forks source link

giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle (GIAB) project. The indexes for sequences and alignments are also available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes .


AshkenazimTrio

Son:HG002     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/
Father:HG003    https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG003_NA24149_father/
Mother:HG004     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG004_NA24143_mother/

Sequencing Platform Sequence Alignment
Illumina WGS 2x150bp 300X per individual All     HG002     HG003     HG004 novoalign:   All     HG002     HG003     HG004
Illumina 6KB Matepair All     HG002    HG003     HG004 bwamem:hg19   All     HG002     HG003     HG004
Illumina WGS 2X250bp All     HG002     HG003     HG004 isaac:hg19   All     HG002     HG003     HG004
novoalign:   All    HG002     HG003     HG004
Moleculo All     HG002     HG003     HG004
Illumina Whole Exome - bwamem:hg19   All     HG002    HG003     HG004
SOLiD 60x for son All     HG002 LifeScope:hg19   All     HG002
CompleteGenomics - CGAtools:hg19   All     HG002     HG003     HG004
Ion Proton 1000x Exome - TMAP:hg19   All     HG002     HG003     HG004
10X Genomics - bwamem:hg19   All     HG002     HG003     HG004
10X Genomics ChromiumGenome All     HG002 LongRanger2.0:hg19   All     HG002     HG003     HG004
BioNano All:bnx     HG002:bnx     HG003:bnx     HG004:bnx All:cmap     HG002     HG003     HG004
PacBio 70x/30x/30x All     HG002     HG003     HG004
All:hdf5     HG002     HG003     HG004
NGMLR:hg19   All     HG002     HG003     HG004
minimap2:   All     HG002     HG003     HG004
PacBio CCS 10kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 11kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 15kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 15kb_20kb chemistry2 All     HG002 pbmm2:   All     HG002     HG003     HG004
Oxford Nanopore 2D All     HG002 -
Oxford Nanopore ultralong (guppy-V3.2.4_2020-01-22) All     HG002 minimap2:whatshap:hg19   All     HG002
Oxford Nanopore ultralong Promethion All     HG002     HG003     HG004 -
BGI BGISEQ500 All     HG002 -
BGI MGISEQ PCR-free All     HG002 -
BGI stLFR All     HG002     HG003     HG004 All:bwamem:hg19     HG002     HG003     HG004
Strand-Seq HG002 by BCCRC All     HG002 -

* CompleteGenomics LFR raw or alignment data not available, but analysis results available under: https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/CompleteGenomics_newLFR_CGAtools_06122015/


ChineseTrio

Son:HG005     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG005_NA24631_son/
Father:HG006     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG006_NA24694-huCA017E_father/
Mother:HG007     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG007_NA24695-hu38168_mother/

Sequencing Platform Sequence Alignment
Illumina WGS 2x250bp 300X for son;
2x150bp 100x for parents
All     HG005     HG006     HG007 novoalign:   All:hg19-hg38     HG005:hg19-hg38     HG006:hg19-hg38     HG007:hg19-hg38
Illumina 6KB Matepair All     HG005     HG006     HG007
Moleculo All     HG005     HG006     HG007
SOLiD 60x for son All:xsq     HG005:xsq LifeScope:   All:hg19     HG005:hg19
CompleteGenomics CGAtools: All:hg19 (RMDNA)     HG005:hg19     HG006:hg19     HG007:hg19
CGAtools: All:hg19 (cellsDNA)     HG005:hg19
Illumina Whole Exome bwamem:   All:hg19     HG005:hg19
Ion Proton 1000x Exome TMAP:   All:hg19     HG005:hg19
BioNano for son All:bnx     HG005:bnx All:hg19 (cmap)     HG005:hg19 (cmap)
PacBio Sequel for the trio All     HG005     HG006     HG007
PacBio SequelII CCS 11kb
BGI BGISEQ500, MGISEQ, stLFR



NA12878

NA12878:HG001     https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/

Sequencing Platform Sequence Alignment
Illumina WGS 2x150bp 300X HG001 bwamem:   HG001:hg19 (downsampled30x)
novoalign:   HG001
Illumina HiSeq Exome HG001
HG001:trimmed_fastq
bwamem:   HG001:hg19
Illumina TruSeq Exome bwamem:   HG001:hg19
10X Genomics bwamem:   HG001:hg19
bwamem:   HG001:hg19 (size_selected)
10X Genomics ChromiumGenome LongRanger2.0:   HG001:hg19-hg38
LongRanger2.1:   HG001:hg19-hg38
CompleteGenomics CGAtools:   HG001:hg19
Ion Proton 1000x Exome TMAP:   HG001:hg19
NA12878 SOLiD5500W LifeScope:   HG001:hg19
BGI BGISEQ500, MGISEQ, stLFR
PacBio 40x HG001:hdf5
PacBio SequelII CCS 11kb
Ultralong_OxfordNanopore -
minimap2:   HG001




Please Note:
1. If you want to use raw sequencing data (fastq, fasta, hdf5, xsq, bnx etc) for your analysis, then you can use the sequence.index. files when you need to download the data.
2. If you want to use aligned data (bam, xmap/cmap etc.) for your analysis, then you can use the alignment.index.
files when you need to download the data.