biolink / biolinkml

DEPRECATED: replaced by linkml
https://github.com/linkml/linkml
Creative Commons Zero v1.0 Universal
23 stars 12 forks source link

Bug in python dataclasses generated from gen-py-classes #88

Closed deepakunni3 closed 4 years ago

deepakunni3 commented 4 years ago
@dataclass
class Annotation(YAMLRoot):
    """
    An annotation on a sample. This is essentially a key value pair
    """
    _inherited_slots: ClassVar[List[str]] = []

    # === annotation ===
    has_raw_value: str
    has_characteristic: Optional[Union[str, CharacteristicId]] = None
    has_normalized_value: List[Union[dict, "NormalizedValue"]] = empty_list()

    def _fix_elements(self):
        super()._fix_elements()
        self.has_normalized_value = [v if isinstance(v, NormalizedValue)
                                     else NormalizedValue(**v) for v in self.has_normalized_value]

The desired behavior here is for has_characteristic to store an object and not a string. Is there a way to enforce this expectation?

i.e.

    has_characteristic: Optional[Union[str, CharacteristicId]] = None

should actually be,

    has_characteristic: Optional[Characteristic]] = None

Reference

Original YAML: https://github.com/microbiomedata/nmdc-metadata/commit/f913bc171688352fdea143deae76f7e7999dae20 Generated Python dataclasses: https://github.com/microbiomedata/nmdc-metadata/blob/f4f9f644d694e71fefed3bf4bc7b189a37f353e3/schema/nmdc.py

cmungall commented 4 years ago

Actually, I am not sure this is a bug, it is more a feature request: we want to be able to inline objects.

Note that for the primary use case (biolink) we deliberately avoid inlining, everything is very key-based. Overall this leads to more normalized json, as there is no repetition.

It's not clear if biolinkml supports inlining of objects.

One option is that we just suck this up, and make our nmdc json more normalized. We would not repeat the same characteristic object each time, the json would be flatter. This may actually be a good decision. It is somewhat harder to work with as code has to do lots of lookups, rather than just traversals.

Or it may be that inlining is already here or easy to add

cmungall commented 4 years ago

UPDATE

we have it: https://biolink.github.io/biolinkml/docs/inlined