NCATSTranslator / reasoner-validator

Validation of Translator OpenAPI (TRAPI) messages both to TRAPI and Biolink Model standards. See https://ncatstranslator.github.io/reasoner-validator/
Other
2 stars 4 forks source link

Why is EDAM-DATA: unknown to Biolink? #84

Closed edeutsch closed 1 year ago

edeutsch commented 1 year ago

When I run this response though the validator: https://arax.ncats.io/devLM/?r=142300

I see the following warning:

* Knowledge Graph Edge Attribute Type Id Unknown:
=> Edge has an attribute_type_id that has a CURIE prefix namespace unknown to Biolink
    # EDAM-DATA:2526:
    - edge_id: 
        CHEMBL.COMPOUND:CHEMBL112--biolink:occurs_together_in_literature_with->NCBIGene:762

But yet, I think I see EDAM-DATA as a CURIE prefix here: https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L55

(technically I am validating against 3.2.8, so this is the appropriate version, but same finding: https://github.com/biolink/biolink-model/blob/a012889faa773d7afb02e37ab93b34a8b0065877/biolink-model.yaml#L54

Maybe this is a Biolink question/issue for @sierra-moxon and BMT rather than the validator per se?

RichardBruskiewich commented 1 year ago

Hi @edeutsch,

This error message generally means that the Biolink Model Toolkit ("BMT") cannot resolve a given namespace against the Biolink Class context it is given (or assuming).

That is, the namespace is not listed in the id_prefixes list of the specific context (for example, if the context is a Biolink category class, then the namespace must be in the id_prefixes of that given category class definition). Note that id_prefixes are not inherited by children. I'm not sure if that is the way things ought to be, but to my knowledge, that is currently the case. We'd need to review this with @cmungall and @sierra-moxon to see if this model design needs revisiting.

That said, I'll need to double check (in a few minutes... please bear with me) how the validation (above) is specifically undertaken for Knowledge Graph attribute_type_id fields, which are not necessary Biolink category class terms.

In that light, I do note that the attribute class is the one that has the id_prefixes list where EDAM-DATA is specifically listed.

Not sure how this aligns with namespace discovery for attribute_type_id fields. It is conceivable that we need to fix or add functionality in the BMT to cover this use case.

RichardBruskiewich commented 1 year ago

So, the validation code is triggered here:

                        elif not self.bmt.get_element_by_prefix(prefix):
                            self.report(
                                code="warning.knowledge_graph.edge.attribute.type_id.unknown_prefix",
                                identifier=attribute_type_id,
                                edge_id=edge_id
                            )

where

    def get_element_by_prefix(
            self,
            identifier: str
    ) -> List[str]:
        """
        Get a Biolink Model element by prefix.

        Parameters
        ----------
        identifier: str
            The identifier as a CURIE

        Returns
        -------
        Optional[str]
                The Biolink element corresponding to the given URI/CURIE as available via
                the id_prefixes mapped to that element.

        """
        categories = []
        if ":" in identifier:
            id_components = identifier.split(":")
            prefix = id_components[0]
            elements = self.get_all_elements()
            for category in elements:
                element = self.get_element(category)
                if hasattr(element, 'id_prefixes') and prefix in element.id_prefixes:
                    categories.append(element.name)
        if len(categories) == 0:
            logger.warning("no biolink class found for the given curie: %s, try get_element_by_mapping?", identifier)

        return categories

where the following model in the master branch has EDAM-DATA:

  attribute:
    is_a: named thing
    mixins:
      - ontology class
    description: >-
      A property or characteristic of an entity.
      For example, an apple may have properties such as color, shape, age, crispiness.
      An environmental sample may have attributes such as depth, lat, long, material.
    slots:
      - name                   # 'attribute_name'
      - has attribute type     # 'attribute_type'
      # 'value', 'value_type', 'value_type_name'
      # extracted from either of the next two slots
      - has quantitative value
      - has qualitative value
      - iri                    # 'url'
    slot_usage:
      name:
        description: >-
          The human-readable 'attribute name' can be set to a string which reflects its context of
          interpretation, e.g. SEPIO evidence/provenance/confidence annotation or it can default
          to the name associated with the 'has attribute type' slot ontology term.
    id_prefixes:
      - EDAM-DATA
      - EDAM-FORMAT
      - EDAM-OPERATION
      - EDAM-TOPIC
    exact_mappings:
      - SIO:000614
    in_subset:
      - samples
RichardBruskiewich commented 1 year ago

BTW, @edeutsch, I found the EDAM term online at EDAM.obo and it seems obsolete?

[Term]
id: EDAM_data:2526
name: Article data
comment: This is a broad data type and is used a placeholder for other, more specific types.  It is primarily intended to help navigation of EDAM and would not typically be used for annotation. It includes concepts that are best described as scientific text or closely concerned with or derived from text.
subset: bioinformatics
subset: data
subset: edam
created_in: "beta12orEarlier"
def: "Data concerning the scientific literature." [http://edamontology.org]
namespace: data
obsolete_since: "beta13"
!is_a: ObsoleteClass ! Obsolete concept (EDAM)
is_obsolete: true
consider: EDAM_data:0971 ! Article
RichardBruskiewich commented 1 year ago

I put in a unit test with your EDAM-DATA value and replicated the error. I'll iterate on this now.

RichardBruskiewich commented 1 year ago

Well, what do you know... the code has a logical error: get_element_by_prefix() expects an CURIE not just the namespace as the input value.

No wonder it can't validate the term (namespace)!

I'll fix that and see if that fixes the mistaken validation.

RichardBruskiewich commented 1 year ago

Resolved by release v3.5.9