RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
34 stars 9 forks source link

Biolink v4.2.0 Incompatibilities #381

Open ecwood opened 1 week ago

ecwood commented 1 week ago

The failure of the CI run for 55e2c16 suggests that the predicate mappings are not up-to-date with Biolink v4.2.0:

+ /home/runner/kg2-venv/bin/python3 -u -u /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/validate_predicate_remap_yaml.py /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/curies-to-urls-map.yaml /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/predicate-remap.yaml https://raw.githubusercontent.com/biolink/biolink-model/v4.2.0/src/biolink_model/schema/biolink_model.yaml /home/runner/kg2-build/biolink_model.yaml
/home/runner/kg2-venv/lib/python3.7/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0.  Please remove rdflib-jsonld from your project's dependencies.
  DeprecationWarning,
Traceback (most recent call last):
  File "/home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/validate_predicate_remap_yaml.py", line 195, in <module>
    f"{relation} should map to {allowed_biolink_curies_set} ({mapping_term_used.split('_')[0]})"
AssertionError: SEMMEDDB:ADMINISTERED_TO should map to {'biolink:applied_to_treat'} (broad)
ecwood commented 1 week ago

I'm not sure what to do about SEMMEDDB:TREATS, since it is mapped in Biolink to a mixin.

+ /home/runner/kg2-venv/bin/python3 -u -u /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/validate_predicate_remap_yaml.py /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/curies-to-urls-map.yaml /home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/predicate-remap.yaml https://raw.githubusercontent.com/biolink/biolink-model/v4.2.0/src/biolink_model/schema/biolink_model.yaml /home/runner/kg2-build/biolink_model.yaml
/home/runner/kg2-venv/lib/python3.7/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0.  Please remove rdflib-jsonld from your project's dependencies.
  DeprecationWarning,
Traceback (most recent call last):
  File "/home/runner/work/RTX-KG2/RTX-KG2/RTX-KG2/validate_predicate_remap_yaml.py", line 165, in <module>
    assert core_predicate not in biolink_mixins, (relation, core_predicate, {'Mixins': biolink_mixins})
AssertionError: ('SEMMEDDB:TREATS', 'biolink:treats_or_applied_or_studied_to_treat', {'Mixins': ['biolink:interacts_with', 'biolink:increases_amount_or_activity_of', 'biolink:decreases_amount_or_activity_of', 'biolink:chemical_role_mixin', 'biolink:biological_role_mixin', 'biolink:promotes_condition', 'biolink:treats', 'biolink:treated_by', 'biolink:treats_or_applied_or_studied_to_treat', 'biolink:subject_of_treatment_application_or_study_for_treatment_by', 'biolink:chemical_entity_or_drug_or_treatment']})
saramsey commented 5 days ago

Per clarification from Sierra Moxon, use of mixin: true predicates in Biolink directly as predicates in triples is now allowed. So we can relax the biolink mixin check, I believe. The separate TRAPI validator may still be complaining about it, but I have checked with Sierra more than once to confirm that use of mixins (and will continue to be) allowed. It just may take some time for the TRAPI validator to be updated to reflect that. And of course, we'll want to update our valiadator in validate_predicate_remap_yaml.py. Thank you!!

saramsey commented 5 days ago

For SEMMEDDB:administered_to and SEMMEDDB:ADMINISTERED_TO, I favor the more generic biolink:treats_or_applied_or_studied_to_treat since I suspect a lot of edges picked up by SemMedDB will actually be investigational (i.e., "we tried administering silvadene creme to eczema lesions" or whatever) rather than clinical practice.

For SEMMEDDB:associated_with, yes, biolink:associated_with looks appropriate. Thank you!!

ecwood commented 5 days ago

For SEMMEDDB:administered_to and SEMMEDDB:ADMINISTERED_TO, I favor the more generic biolink:treats_or_applied_or_studied_to_treat since I suspect a lot of edges picked up by SemMedDB will actually be investigational (i.e., "we tried administering silvadene creme to eczema lesions" or whatever) rather than clinical practice.

I agree completely. However, Biolink has ruled that it maps to biolink:applied_to_treat. Should we add an exception in our validator for it? Or reach out to Biolink?

ecwood commented 5 days ago

Per my meeting with Steve today, I should add an exception for SEMMEDDB:administered_to and SEMMEDDB:ADMINISTERED_TO. I did this in b1d7501.