Open shukwong opened 1 month ago
Some suggestions for germline variant calling and GWAS genotype sample data:
GIAB AshkenzsimTrio BAMs: https://42basepairs.com/browse/s3/giab/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams https://42basepairs.com/browse/s3/giab/data/AshkenazimTrio/HG003_NA24149_father/NIST_Illumina_2x250bps/novoalign_bams https://42basepairs.com/browse/s3/giab/data/AshkenazimTrio/HG004_NA24143_mother/NIST_Illumina_2x250bps/novoalign_bams
Truth set for HG002 (son) small variants: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/chrXY_v1.0/GRCh38/SmallVariant/ Others at: https://www.nist.gov/programs-projects/genome-bottle
GWAS genotype data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/COXHAP&version=4.1 see https://github.com/andrewhaoyu/multi_ethnic for more details
WIP
Here is the list of tools currently included in our flowiq_mapping.json
file, along with the data types that each tool can process.
May be useful: https://registry.opendata.aws/
also this https://github.com/nf-core/test-datasets they are small test datasets and for some tools maybe whole genome would be more useful
Curate test data sets on AWS, preferably on S3 open data, this may include, germline, tumor/normal, genotype, RNASeq data.