This PR does some major restructuring around the resources used to prepare for breakpoint search, particularly in discarding the workaround method to calculate expected values and adding some flexibility to add constraint-related annotations to synonymous and nonsense variants.
Major changes:
Change expected value calculation from workaround method using mu_snp and total_exp to the actual formula as defined by Konrad in LoF constraint analyses, and remove code thus rendered obsolete
Restructure filtered_context to contain missense, nonsense, and synonymous variants and all constraint annotations including observed and expected
This necessitates changes to process_context_ht and process_vep
Command-line arguments can be used to determine whether constraint_prep contains missense, nonsense, and/or synonymous variants
Perform break search on all transcripts, meaning constraint outliers are not filtered out of filtered_context or constraint_prep
Add filtering downstream of RMC to remove constraint outliers from steps like MPC generation
Move Table prep code out of regional_constraint.py pipeline script and into utils
Change break search-associated util/annotation names and structures to be more intuitive
Minor changes:
Update RMC freeze version to 7
Revise parser structure in regional_constraint.py to use subparsers for the different steps
Update flagship LoF bucket path
Remove unused resources and constants
Add new constants for nonsense and synonymous VEP labels
Save coverage and plateau models used to calculate expected values to HailExpression
Remove exome_coverage field and retain coverage field for constraint annotations
Remove gene from groupings used to calculate expected values (transcript should be sufficient)
Remove unused lines in add_obs_annotation
Move section annotation creation into constraint prep HT
Change name of function + function parameters used to calculate chisq of section OE being different from 1 to be more intuitive
This PR does some major restructuring around the resources used to prepare for breakpoint search, particularly in discarding the workaround method to calculate expected values and adding some flexibility to add constraint-related annotations to synonymous and nonsense variants.
Major changes:
mu_snp
andtotal_exp
to the actual formula as defined by Konrad in LoF constraint analyses, and remove code thus rendered obsoletefiltered_context
to contain missense, nonsense, and synonymous variants and all constraint annotations includingobserved
andexpected
process_context_ht
andprocess_vep
constraint_prep
contains missense, nonsense, and/or synonymous variantsfiltered_context
orconstraint_prep
regional_constraint.py
pipeline script and into utilsMinor changes:
7
regional_constraint.py
to use subparsers for the different stepsexome_coverage
field and retaincoverage
field for constraint annotationsgene
from groupings used to calculate expected values (transcript should be sufficient)add_obs_annotation
section
annotation creation into constraint prep HT