Convert simulated admixed data from hap files to VCF

The haps/sample data provided by Marcos and Jessica is located here: gs://gnomad-batch/mwilson/simulated_data/

Jessica provided this description of the files Files named "Mix" refer to 33% EUR/33%AFR/33% NAT simulations, and files named "Brasa" refer to 60%EUR/ 25%AFR/ 15%NAT ones. Both were simulated considering one pulse of admixture 17 generations ago. We have simulated only chr1 for now, but can simulate and send files for the other chromosomes if you need.

Here goes a brief description on the files:

Files for simulating admixed individuals. .phgeno contains the genetic information; .sample are the 1kg (AFR and EUR) and HGDP (NAT) IDs of the reference samples for simulations; .dat contains proportions for simulating each admixture pattern. AFR1.phgeno EUR1.phgeno NAT1.phgeno hgdp_EUR_chr1_AdmixSimu.sample hgdp_AFR_chr1_AdmixSimu.sample hgdp_NAT_chr1_AdmixSimu.sample Mix.dat BRASA.dat

Simulated haplotypes (admix-simu output): Brasa1.hanc2
Brasa.chr1.haps Brasa1.bp Brasa1.log Brasa.chr1.sample
Brasa1.hanc
Brasa1.phgeno

Mix1.hanc2 Mix.chr1.haps Mix1.hanc Mix1.phgeno Mix1.bp Mix1.log Mix.chr1.sample

The Brasa.chr1 and Mix.chr1 haps/sample files were converted to the VCF format using shapeit2 on the gnomad_lai virtual machine and copied to the same bucket.

broadinstitute / gnomad_local_ancestry

Convert simulated admixed data from hap files to VCF #78