Closed ch-kr closed 3 years ago
Desired VCF schema:
GT, GQ, DP, AD, MIN_DP, PGT, PID, PL, SB
AC0, InbreedingCoeff, MonoAllelic, PASS, RF
AC, AN, AF, nhomalt, popmax, faf95, faf99
AC, AN, AF, nhomalt, popmax, faf95, faf99
rf_tp_probability, rf_positive_label, rf_negative_label, rf_label, rf_train
lcr, segdup, nonpar, fail_interval_qc, in_capture_region
allele_type, has_star, n_alt_alleles, original_alleles, variant_type, was_mixed
I just added a batch script to repackage the VCF shard headers (required for ROR). note that the script isn't fully tested -- I tried testing it using one of the old VCF shards but forgot that this wouldn't work (bcftools complains about the duplicated sample ID)
PR with code to re-export 300K VCFs and prepare them for return to the UKBB.
New scripts:
ukbb_header_reformat.sh
get_shard_positions.py
VCF script updates:
region_flag
annotation