dandi / dandi-schema

Schemata for DANDI archive project
Apache License 2.0
7 stars 10 forks source link

C elegans subject metadata #171

Open bendichter opened 1 year ago

bendichter commented 1 year ago

We are trying to help a group get C. elegans data into NWB and DANDI and they are coming up against some incompatibilities in the subject metadata.

From a conversation with @rly and @dysprague:

I was wondering if there were possible changes we could make to the restrictions imposed on age and sex. C. elegans growth stages are defined from L1-L4 followed by adulthood. Most of the worms we use are designated 'YA' for young adult. I believe for our purposes, this is a more useful designation than dahs or weeks old since the lifesplan of the worm is relatively short. Furthermore, for sex, C. elegans have two sexes: male 'XO' and hermaphrodite 'XX' which are designated by their sex chromosomes. I see that there is an option for other, but since the hermaphrodite is the typical system we study, I think it would be useful to be able to just input the chromosomal designations if that's possible.

  1. Subject.sex is currently limited to "M", "F", "U", or "O". For C. elegans, would it be possible to also accept "XO" (male) and "XX" (hermaphrodite)? Would it be possible to accept such data in the NWB schema?

  2. Subject.age. For C. elegans, they are not sure if they can recover the age of the worm. It would be more informative to store the growth stage. We could extend Subject to a new ndtype CElegansSubject that has an additional field growth_stage, and make the necessary changes in NWB Inspector. Would it be possible to accept such data in the NWB schema?

We have started a Draft PR to make these changes for (1) in the NWB Inspector: https://github.com/NeurodataWithoutBorders/nwbinspector/pull/353

satra commented 1 year ago

for c elegans: see supplemental file 7 here: https://www.biorxiv.org/content/10.1101/2020.04.30.066209v3.supplementary-material

also i think it would be good to rethink the nwb core schemas around subject. if you are going to redo pieces of the score schema, let's do a better job of bringing many of the elements in dandischema and from the aind work into the core schema. @saskiad and i also looked at a bunch of related things. finally, more work is coming from cell lines and organoids.

we are also planning on modeling all of this in linkml relatively soon (a lot has already happened connected to this), which may offer a good route for many downstream tasks.

rly commented 1 year ago

From follow-up emails with @dysprague et al, they will be creating a CElegansSubject with the following fields:

This captures the stages described in supp file 7, except that the paper authors describe estimated hours since birth/hatch rather than time since the start of a growth stage. I am not a C elegans expert, so do not know which is more common or useful...

Growth stages that @dysprague et al identified for neurophysiology are:

rly commented 1 year ago

we are also planning on modeling all of this in linkml relatively soon (a lot has already happened connected to this), which may offer a good route for many downstream tasks.

The NWB team is also looking at this. Let's discuss at the next sync.

satra commented 1 year ago

@rly - your proposal looks reasonable to me. the only change i would make is allowing for a range of durations in addition to duration (200 - 400 mins). we crafted a patch to ISO8601 to encode such a range in dandi, as is often needed in certain experiments where a more precise time is not available.

one consideration is whether there should be a specific class for a specific species, or a more general concept. finally, the concept of age here is related to a concept of environment (which presently seems to be temperature, but one may easily consider other parameters). thus the model could separate those considerations. it also seems that the stage identifier does specify temperature.