ga4gh / quality-control-wgs

Home for the GA4GH Quality Control of Whole Genome Sequencing metrics and reference implementations
https://www.ga4gh.org/product/wgs-quality-control-standards/
Apache License 2.0
14 stars 3 forks source link

Feature/diversity #42

Open mhebrard opened 1 month ago

mhebrard commented 1 month ago

Extending the QC metrics to VCF highlight the fact that some metrics are highly correlated with the genetic ancestry of a sample. It was agree by the group that we should diverify the benchmark sample set to include diverse ancestry.

Updated set that maximized diversity in term of population and gender:

Population name (26) Superpopulation code (5) No. of samples (M/F)
African Ancestry SW AFR 4
African Caribbean AFR 4
Bengali SAS 5
British EUR 4
CEPH EUR 4
Colombian AMR 4
Dai Chinese EAS 4
Esan AFR 4
Finnish EUR 4
Gambian Mandinka AFR 4
Gujarati SAS 4
Han Chinese EAS 4
Iberian EUR 4
Japanese EAS 4
Kinh Vietnamese EAS 4
Luhya AFR 4
Mende AFR 4
Mexican Ancestry AMR 4
Peruvian AMR 4
Puerto Rican AMR 2
Punjabi SAS 3
Southern Han Chinese EAS 2
Tamil SAS 4
Telugu SAS 4
Toscani EUR 4
Yoruba AFR 4

TODO: