broadinstitute / ukbb_qc

QCing the UKBB
1 stars 0 forks source link

Freeze 6 #218

Closed ch-kr closed 3 years ago

ch-kr commented 3 years ago

PR with code to re-export 300K VCFs and prepare them for return to the UKBB.

New scripts:

VCF script updates:

ch-kr commented 3 years ago

Desired VCF schema:

FORMAT fields:

GT, GQ, DP, AD, MIN_DP, PGT, PID, PL, SB

FILTER fields:

AC0, InbreedingCoeff, MonoAllelic, PASS, RF

INFO fields:

Frequency fields:

AC, AN, AF, nhomalt, popmax, faf95, faf99

gnomAD frequency fields:

AC, AN, AF, nhomalt, popmax, faf95, faf99

RF fields:

rf_tp_probability, rf_positive_label, rf_negative_label, rf_label, rf_train

Region type fields:

lcr, segdup, nonpar, fail_interval_qc, in_capture_region

VEP

Allele info fields

allele_type, has_star, n_alt_alleles, original_alleles, variant_type, was_mixed

VQSR and info HT fields

Hist fields

ch-kr commented 3 years ago

I just added a batch script to repackage the VCF shard headers (required for ROR). note that the script isn't fully tested -- I tried testing it using one of the old VCF shards but forgot that this wouldn't work (bcftools complains about the duplicated sample ID)