broadinstitute / regional_missense_constraint

Code to calculate regional missense constraint
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

Update globals of final RMC HT #280

Closed ch-kr closed 1 year ago

ch-kr commented 1 year ago

This PR adds minor changes to the schema of the final regional missense constraint (RMC) results Hail Table (HT).

The current schema is:

----------------------------------------
Global fields:
    'plateau_models': struct {
        'total': dict<bool, array<float64>>
    }
    'plateau_x_models': struct {
        'total': dict<bool, array<float64>>
    }
    'plateau_y_models': struct {
        'total': dict<bool, array<float64>>
    }
    'coverage_model': tuple<float64, float64>
----------------------------------------
Row fields:
    'section_obs': int64
    'section_exp': float64
    'section_oe': float64
    'section_chisq': float64
    'transcript': str
    'interval': interval<locus<grch37>>
----------------------------------------
Key: ['interval', 'transcript']
----------------------------------------

This PR removes the models from the globals (they do not need to be stored on the results HT and are taking up unnecessary space) and adds the p-value threshold associated with RMC region search/the resulting regions to the final RMC HT.