Closed edeutsch closed 1 year ago
Hi @edeutsch,
This error message generally means that the Biolink Model Toolkit ("BMT") cannot resolve a given namespace against the Biolink Class context it is given (or assuming).
That is, the namespace is not listed in the id_prefixes
list of the specific context (for example, if the context is a Biolink category class, then the namespace must be in the id_prefixes
of that given category class definition). Note that id_prefixes
are not inherited by children. I'm not sure if that is the way things ought to be, but to my knowledge, that is currently the case. We'd need to review this with @cmungall and @sierra-moxon to see if this model design needs revisiting.
That said, I'll need to double check (in a few minutes... please bear with me) how the validation (above) is specifically undertaken for Knowledge Graph attribute_type_id
fields, which are not necessary Biolink category
class terms.
In that light, I do note that the attribute
class is the one that has the id_prefixes
list where EDAM-DATA is specifically listed.
Not sure how this aligns with namespace discovery for attribute_type_id
fields. It is conceivable that we need to fix or add functionality in the BMT to cover this use case.
So, the validation code is triggered here:
elif not self.bmt.get_element_by_prefix(prefix):
self.report(
code="warning.knowledge_graph.edge.attribute.type_id.unknown_prefix",
identifier=attribute_type_id,
edge_id=edge_id
)
where
def get_element_by_prefix(
self,
identifier: str
) -> List[str]:
"""
Get a Biolink Model element by prefix.
Parameters
----------
identifier: str
The identifier as a CURIE
Returns
-------
Optional[str]
The Biolink element corresponding to the given URI/CURIE as available via
the id_prefixes mapped to that element.
"""
categories = []
if ":" in identifier:
id_components = identifier.split(":")
prefix = id_components[0]
elements = self.get_all_elements()
for category in elements:
element = self.get_element(category)
if hasattr(element, 'id_prefixes') and prefix in element.id_prefixes:
categories.append(element.name)
if len(categories) == 0:
logger.warning("no biolink class found for the given curie: %s, try get_element_by_mapping?", identifier)
return categories
where the following model in the master branch has EDAM-DATA
:
attribute:
is_a: named thing
mixins:
- ontology class
description: >-
A property or characteristic of an entity.
For example, an apple may have properties such as color, shape, age, crispiness.
An environmental sample may have attributes such as depth, lat, long, material.
slots:
- name # 'attribute_name'
- has attribute type # 'attribute_type'
# 'value', 'value_type', 'value_type_name'
# extracted from either of the next two slots
- has quantitative value
- has qualitative value
- iri # 'url'
slot_usage:
name:
description: >-
The human-readable 'attribute name' can be set to a string which reflects its context of
interpretation, e.g. SEPIO evidence/provenance/confidence annotation or it can default
to the name associated with the 'has attribute type' slot ontology term.
id_prefixes:
- EDAM-DATA
- EDAM-FORMAT
- EDAM-OPERATION
- EDAM-TOPIC
exact_mappings:
- SIO:000614
in_subset:
- samples
BTW, @edeutsch, I found the EDAM term online at EDAM.obo and it seems obsolete?
[Term]
id: EDAM_data:2526
name: Article data
comment: This is a broad data type and is used a placeholder for other, more specific types. It is primarily intended to help navigation of EDAM and would not typically be used for annotation. It includes concepts that are best described as scientific text or closely concerned with or derived from text.
subset: bioinformatics
subset: data
subset: edam
created_in: "beta12orEarlier"
def: "Data concerning the scientific literature." [http://edamontology.org]
namespace: data
obsolete_since: "beta13"
!is_a: ObsoleteClass ! Obsolete concept (EDAM)
is_obsolete: true
consider: EDAM_data:0971 ! Article
I put in a unit test with your EDAM-DATA value and replicated the error. I'll iterate on this now.
Well, what do you know... the code has a logical error: get_element_by_prefix() expects an CURIE not just the namespace as the input value.
No wonder it can't validate the term (namespace)!
I'll fix that and see if that fixes the mistaken validation.
Resolved by release v3.5.9
When I run this response though the validator: https://arax.ncats.io/devLM/?r=142300
I see the following warning:
But yet, I think I see EDAM-DATA as a CURIE prefix here: https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L55
(technically I am validating against 3.2.8, so this is the appropriate version, but same finding: https://github.com/biolink/biolink-model/blob/a012889faa773d7afb02e37ab93b34a8b0065877/biolink-model.yaml#L54
Maybe this is a Biolink question/issue for @sierra-moxon and BMT rather than the validator per se?