biolink / biolinkml

DEPRECATED: replaced by linkml
https://github.com/linkml/linkml
Creative Commons Zero v1.0 Universal
23 stars 12 forks source link

Design patterns for handling measurement properties #159

Closed cmungall closed 3 years ago

cmungall commented 4 years ago

I see two general patterns in templates/schemas/checklists for dealing with quantitative properties (slots) such as depth

  1. A general property such as depth, allow specification of booth value and a unit
  2. Bake in the unit into the property, e.g. depth_in_meters, and allow only a number as range

For simplicity I am omitting reified models etc

An anti-pattern is a field name such as depth where the unit is implicit, this has obvious issues

For 1, there are a number of variants:

Note these are not mutually exclusive. E.g. in the NMDC schema properties have objects as value, these have both a raw-value property, and normalized properties, see QuantityValue

Note also that 2 affords extra flexibility, e.g. if we want to add precision, or provenance, it is trivial to extend

I think the modeling patterns for 2 are relatively clear. For 1, how should we indicate the unit?

Of course, saying depth_in_meters is obvious to a human, but how do we formalize this?

One way would be to make quantities first class members in the metamodel and encode this directly with a unit metaproperty, e.g.

slots:
  depth_in_meters:
     is_a: depth.   ## optional
     unit: m

another option is to make the metamodel more neutral, and define types for use in measurement domains, e.g.

types:
  meter_value:
    description: "used for "
    uri: xsd:double
    base: double
    mappings:
      - UO:123456 ## meter

...

slots:
  depth_in_meters:
    range: meter_value

this makes things a little more explicit but we don't really have a formal binding

in either case, the underlying semantics at the rdf/json-ld level should be:

?i depth_in_U ?v
<->
?i depth [
   unit: U
   value: ?v
]
hsolbrig commented 4 years ago

The proposed solution is to add an attribute to the ClassDefinition element called "string_template", whose formatting rules conform to https://docs.python.org/3.8/library/string.html#formatspec. Classes with defined string_templates will carry the template as a class variable and will include:

class C1(YAMLRoot):
         ...
    string_template: ClassVar[str] = "{value}:{units}"
         ...
    def __str__(self):
        return C1.string_template.format(**{k: '' if v is None else v for k, v in self.__dict__.items()})

    @classmethod
    def parse(cls, text: str) -> "C1":
        v = parse.parse(C1.string_template, text)
        return C1(*v.fixed, **v.named)

I haven't fully explored this, but it appears to supply some pretty reasonable power. One question would be the parse name -- is there a more appropriate term we could use?

Note that the issue_159 branch has the all of the proposed changes with the exceptions of the updates to the python generator.

hsolbrig commented 4 years ago

Also, does this cover Issue #82 and #128?

hsolbrig commented 3 years ago

Going to go ahead and proceed with this proposed solution

cmungall commented 3 years ago

can this be closed?

hsolbrig commented 3 years ago

To the best of my knowledge, yes