VariantEffect / mavedb-api

MaveDB API
GNU Affero General Public License v3.0
8 stars 2 forks source link

Score range data model #271

Open bencap opened 1 month ago

bencap commented 1 month ago

Data model for score ranges:

Validation:

bencap commented 3 weeks ago

New column in score sets table: score_ranges. Column type: JSONB.

Column is nullable if score ranges do not exist. Any dictionary must contain keys normal and abnormal. Additional keys can be included as necessary by score set creators. These keys must contain a description of the score range in addition to the range.

{
  'normal': {
    'description': 'xyz',
    'range': (lower, upper)
  }
  'abnormal': {
    'description': 'abc',
    'range': (lower, upper)
  },
  'extra1': {
    'description': 'efg',
    'range': (lower, upper)
  },
}

Validation:

Open questions:

bencap commented 1 week ago

Some answers to explored implementation details:

If a user is providing score ranges, should a normal and an abnormal score range always be present?

Should a user be able to name a normal/abnormal name whatever they want, and just mark it as the normal/abnormal range? Would we like to enforce the naming for normal/abnormal score ranges?

Should users be able to provide more granular ranges, but still be able to mark them as normal/abnormal? For instance, let’s suppose a user defines some ranges: granular_range_1: 1.0 - 2.0, granular_range_2: 2.0 - 3.0 . Might we give them the option to say: the combination of granular_range_1 and granular_range_2 is considered the ‘normal’ range for this score set? Or should we only allow ‘extra’ granular ranges and make normal and abnormal ranges just a single range of values.

Wild type ‘ranges’ are just a single score? Would we like to validate that this wt score is within the provided normal range?

It makes sense to require at least one normal and one abnormal range. If for some reason the user can't specify both normal and abnormal ranges, that will probably break all the downstream uses of the range data anyways so it's fine if they can't give us just one.

How about we let the users label the ranges whatever they want, but they have to include a range_classification or similar property that can be normal or abnormal. For example:

{
'label': 'granular_range_1',
'bounds': {
'lower': 1.0,
'upper': 2.0
},
'classification': 'normal'
}

The wild type should be a single score, and yes we should validate that this wild type score is within a provided normal range. There could in theory be exceptions to this, but until we hit one I think being strict (rather than warning) is the correct policy.