TheJacksonLaboratory / ExperimentalModelSchema

Experimental Model Schema
https://thejacksonlaboratory.github.io/ExperimentalModelSchema/
MIT License
1 stars 0 forks source link

Insufficient representation of treatments in MPD #16

Open hansenp opened 1 year ago

hansenp commented 1 year ago

Treatments or interventions can/should affect certain characteristics of individuals and thus also the corresponding measurement results. For instance, a mouse treated with a low-fat diet for eight weeks would be expected to have a lower body weight than a mouse of the same age treated with a high-fat diet for eight weeks. Therefore, for a correct interpretation of measurements, it is crucial to know whether there was a treatment, and if so, what treatment and when or over what period of time it was applied.

In the MPD schema, treatments are represented as part of the measurement information, by only a single column intervention containing free text. There is no column for the time point or time period of treatments. Instead, such information is often implicitly contained in the columns for variable names or descriptions.

For example, for the Attie2 project, the body weight of each mouse was determined 22 times within the 5th to 26th week of life (ageweeks). The Treatment column (called intervention in the database) contains high-fat high sucrose diet. The week numbers in the variable names (bw_1wk, bw_2wk, ...) refer to the start of the diet, which was in the 4th week of life, i.e. bw_1wk corresponds to the measurement that was performed in the 5th week of life. To find out that the diet started at four weeks of age, I had to check the associated publication.

image

Detailed presentation of the measurements for the 8th week of the diet:

image

I also find measurements for which no treatment is given, although in fact there was a treatment, which can be seen from the variable names and descriptions and verified from the associated publication. For instance, for the Auwerx2 project, the mice were treated with a control diet (CD) and a high-fat diet (HFD). No treatment is specified. Instead, the variable names and descriptions contain the abbreviations CD and HFD. To find out the period of time during which the mice were kept on diet (8th to 21th week of life), I had to consult the protocol.

image

My conclusion: The MPD schema does not allow treatments to be sufficiently documented in a structured form. As a result, the data is currently unstructured. In order to prepare data from MPD in such a way that datasets from different projects and perhaps even databases can be flexibly combined for meta-analyses (e.g. as EMS packages for individual animals), the data must first be structured. This can only be achieved by revising the MPD database schema and subsequent manual, possibly software-assisted, curation.

sbello commented 2 months ago

It could be worth looking at the ExperimentalCondition linkML model developed by the Alliance to model treatments. The model for this is incorporated in the document for phenotype and disease annotations but should be broadly applicable to other types of annotations. LinkML yaml file: https://github.com/alliance-genome/agr_curation_schema/blob/main/model/schema/phenotypeAndDiseaseAnnotation.yaml

The Alliance model has been developed using the requirements from multiple models organisms and has minimal required fields to allow for flexibility.

The LinkML modeling language can be used to export the models in a variety of formats including JSON. See https://linkml.io/linkml/ form more information about LinkML.