This PR adds changes to allow the missense badness models to be generated with variant data in the following sets of transcripts: all training transcripts, all test transcripts, or k-folds of the training transcript set. These transcript sets were defined in the notebook: gs://regional_missense_constraint/notebooks/train_test_split_transcripts.ipynb
Major changes:
Added two reference tables for ease of working with transcripts to reference_data.py: transcript_ref and transcript_cds
Added a utility function to filter a VEP context table to coding sites in a given set of transcripts
Added resource paths for the training, validation, and test sets of transcripts generated in gs://regional_missense_constraint/notebooks/train_test_split_transcripts.ipynb
Added a constant for the number of folds in the training set
Added capability to create and write out the missense badness models and associated resources/temporary tables on the aforementioned sets of transcripts-
Minor changes:
Updated CURRENT_FREEZE to RMC freeze 5
Updated a couple of resource paths to use existing variables for path prefixes
This PR adds changes to allow the missense badness models to be generated with variant data in the following sets of transcripts: all training transcripts, all test transcripts, or k-folds of the training transcript set. These transcript sets were defined in the notebook:
gs://regional_missense_constraint/notebooks/train_test_split_transcripts.ipynb
Major changes:
reference_data.py
:transcript_ref
andtranscript_cds
gs://regional_missense_constraint/notebooks/train_test_split_transcripts.ipynb
Minor changes:
CURRENT_FREEZE
to RMC freeze 5generic.py
Analogous changes to MPC-related code will be forthcoming in another PR.