bigdatagenomics / eggo

Ready-to-go Parquet-formatted public 'omics datasets
Apache License 2.0
30 stars 8 forks source link

1000 Genomes Phase 3 VCF data set to be hosted in S3 in parquet format #121

Open laserson opened 9 years ago

ryan-williams commented 9 years ago

OOC, is the ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz file here an example of the VCF you're referring to?

laserson commented 9 years ago

I believe so, but I'd like to take the genotypes as well, not just the variants.

fnothaft commented 9 years ago

You already have these at s3://bdg-eggo/1kg/genotypes/, no?

laserson commented 9 years ago

I think those were phase 1. And outdated anyway...wanna redo it.