dcppc / crosscut-metadata

7 stars 6 forks source link

use of Dimensions with Molecular entities - Proposal #20

Open proccaserra opened 5 years ago

proccaserra commented 5 years ago

@jonathancrabtree, @agbeltran following-up on our call, I am logging an issue but it is a point of discussion regarding the use of DATS.Dimension with DATS.MolecularEntity in the context of MGI dataset.

  1. reserve use of DATS.dimension to cover variables which can be measured about the molecular entity (e.g. 'molecular weight', 'pKa', concentration) as per the definition of DATS.dimension (A feature of an entity, i.e. an individual measurable property (both quantitative or qualitative) of the entity being observed.)

  2. extend Molecular Entity properties to include a 'location' information akin to DATS.Material.spatialCoverage.

rationale:

This proposal stems from the discussion we had about consistent use of object properties (either DATS.Dimension or DATS.extraProperties).

Such modification/extension to DATS.MolecularEntity would help clarify / refine the ER diagram you pushed the other day.

cmungall commented 5 years ago

Given the alliance data isn't typically about individual organisms or measurements, a measurement based model doesn't really make sense (there is some data at this level of granularity at some of the MODs, but this is not typical).

However, I can see the value of a generic datamodel where we have entities with arbitrary properties ("Dimensions"). This seems quite powerful for modeling arbitrary outputs of analysis programs which are typically tabular or vector oriented. But it seems it may be limited for a heavily normalized (in the Codd sense) knowledge resource like a model organism database. And I don't really understand where things like extraProperties come in. When is a property extra?

I think therefore I am tending towards option 2. I'm not quite sure how best to implement. Only genomic entities will have chromosome base pair range localizations. Actual gene products and molecular complexes have subcellular localizations. But it seems like a start to move in this direction.