Score range data model - Githubissues

bencap commented 1 month ago

Data model for score ranges:

What is the wt score?
What ranges exist on these scores and how do they map?

Validation:

Ranges should not overlap
Other validation should be in the form of warnings, and can be handled in the UI.

bencap commented 3 weeks ago

New column in score sets table: score_ranges. Column type: JSONB.

Column is nullable if score ranges do not exist. Any dictionary must contain keys normal and abnormal. Additional keys can be included as necessary by score set creators. These keys must contain a description of the score range in addition to the range.

{
  'normal': {
    'description': 'xyz',
    'range': (lower, upper)
  }
  'abnormal': {
    'description': 'abc',
    'range': (lower, upper)
  },
  'extra1': {
    'description': 'efg',
    'range': (lower, upper)
  },
}

Validation:

Ranges should not overlap
Ranges with only a lower or upper bound are allowed-- they must still not overlap
Ranges must not have a lower bound higher than the upper bound (or vice-versa)
Score sets defining ranges must define both an abnormal and normal range.

Open questions:

Are WT scores a range or just a point? Either way, should be easy to fit into the model.

bencap commented 1 week ago

Some answers to explored implementation details:

If a user is providing score ranges, should a normal and an abnormal score range always be present?

Should a user be able to name a normal/abnormal name whatever they want, and just mark it as the normal/abnormal range? Would we like to enforce the naming for normal/abnormal score ranges?

Should users be able to provide more granular ranges, but still be able to mark them as normal/abnormal? For instance, let’s suppose a user defines some ranges: granular_range_1: 1.0 - 2.0, granular_range_2: 2.0 - 3.0 . Might we give them the option to say: the combination of granular_range_1 and granular_range_2 is considered the ‘normal’ range for this score set? Or should we only allow ‘extra’ granular ranges and make normal and abnormal ranges just a single range of values.

Wild type ‘ranges’ are just a single score? Would we like to validate that this wt score is within the provided normal range?

It makes sense to require at least one normal and one abnormal range. If for some reason the user can't specify both normal and abnormal ranges, that will probably break all the downstream uses of the range data anyways so it's fine if they can't give us just one.

How about we let the users label the ranges whatever they want, but they have to include a range_classification or similar property that can be normal or abnormal. For example:
{
'label': 'granular_range_1',
'bounds': {
'lower': 1.0,
'upper': 2.0
},
'classification': 'normal'
}
The wild type should be a single score, and yes we should validate that this wild type score is within a provided normal range. There could in theory be exceptions to this, but until we hit one I think being strict (rather than warning) is the correct policy.

VariantEffect / mavedb-api

Score range data model #271