Closed jborden closed 2 years ago
I think it makes the most sense for us to save the SNV/indel VCF files from here (processed variant calls). I'll plan on doing that for this brick. I don't think it makes sense to download the alignment files for each individual that participated in this study (eg, BAM/CRAM files). That would likely take a few hundred GB (at least) and I imagine anyone wanting to work with that data would either want to download it directly or compute on the Cloud to avoid having to download it.
The VCF files are tab separated files, though, so I'll setup the brick to have both the vcf and the parquet versions of the files available for download. Sometimes the vcf file is the end product that people want to use for analysis whereas other times the vcf file might be an input to another program. Thus, I think it would be helpful to have both file types available. VCF files generally aren't too large so I don't expect storage space to be an issue in this case.
Focusing on vcf makes sense to me
This brick is done.
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/ a LOT of data