Open cbeauhilton opened 3 years ago
organism
: defaults to humans, but would be wise to make more general (and allow comparative bio studies?)clinical_setting
: defaults to all, to capture absolute known max and min, but if there are other clinical settings that would be useful to filter on capture these as well (e.g. hyperferritinemia in HLH seems to cap somewhere south of 100k, but in heme malignancy can reach to >150k; sex differences; age differences). Might need to define combinations of unique fields/complex primary keys to allow for these (maybe organism+clinical_setting+sample_source+metric_name
, where clinical_setting
is also a complex key sex+age+clinical_scenario
, maybe options for geography/race/ethnicities, as might be important in something like in benign ethnic neutropenia?). sample_source
: defaults to blood, but could be serum, CSF, something like "body" for measurements such age/weight/height, particular imaging study source (CT w certain cuts, TTE measurements, ...), etc.metric_name
: pick a generalized nameunits
: For each metric, would have to pick a canonical unit, then if people contribute metrics in other units would have to convert (initially, probably just make them do their own conversions prior to contributing - but ‘Pint’ is great).Optional[float]
as there may be only a min OR max reported, some things don't make sense (Hgb 0 == dead
)*_ref
: preferably from peer-reviewed literature. This whole project may be an interesting way to mobilize case studies. Also might have "in-house from xyz_institution" as an option, for authorized committers (e.g. folks at VUMC included in the project with the ability to pull data from Synthetic/Research derivatives, folks from other places with similar institutional access). For the normal ranges, will start with VUMC in-house reference ranges, but ranges from the literature would be good as well. Again on the megalomaniacal end, if we could include multiple reference ranges from institutions, people could filter to their own locale.Relational databases (SQLite
/PostgreSQL
) are probably the right answer, but a NoSQL approach may make it easier to adjust on the fly without a bunch of migrations. Making these migrations easy is probably the "real" answer. SQLModel
is also very, very nice, and would be a shame to lose if we chose a NoSQL approach. Could also do what I’m doing for the ash-abstracts project and build the db from JSON, with ‘alter=true’, which kind of accomplishes both goals.
I'm fairly certain I'm going to miss a bunch of possibly essential fields for the core database. May also make sense to have a separate table for each metric? (I think I like the metric-based approach, as opposed to a hierarchical approach based on e.g. organism)
Model Example