Closed mike-w-wilson closed 3 years ago
The haps/sample data provided by Marcos and Jessica is located here: gs://gnomad-batch/mwilson/simulated_data/
Jessica provided this description of the files Files named "Mix" refer to 33% EUR/33%AFR/33% NAT simulations, and files named "Brasa" refer to 60%EUR/ 25%AFR/ 15%NAT ones. Both were simulated considering one pulse of admixture 17 generations ago. We have simulated only chr1 for now, but can simulate and send files for the other chromosomes if you need.
Here goes a brief description on the files:
Files for simulating admixed individuals. .phgeno contains the genetic information; .sample are the 1kg (AFR and EUR) and HGDP (NAT) IDs of the reference samples for simulations; .dat contains proportions for simulating each admixture pattern. AFR1.phgeno EUR1.phgeno NAT1.phgeno hgdp_EUR_chr1_AdmixSimu.sample hgdp_AFR_chr1_AdmixSimu.sample hgdp_NAT_chr1_AdmixSimu.sample Mix.dat BRASA.dat
Simulated haplotypes (admix-simu output):
Brasa1.hanc2
Brasa.chr1.haps
Brasa1.bp
Brasa1.log
Brasa.chr1.sample
Brasa1.hanc
Brasa1.phgeno
Mix1.hanc2 Mix.chr1.haps Mix1.hanc Mix1.phgeno Mix1.bp Mix1.log Mix.chr1.sample
The Brasa.chr1 and Mix.chr1 haps/sample files were converted to the VCF format using shapeit2 on the gnomad_lai virtual machine and copied to the same bucket.
Marcos and Jessica use the shape-it default output, haps/sample files, the batch lai pipeline currently only accepts VCFs. Use shapeit to convert to a VCF