RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
38 stars 8 forks source link

Clinical Trials KP edge predicates are malformed - double 'biolink' prefix - e.g., "biolink:biolink_treats" #416

Open amykglen opened 1 month ago

amykglen commented 1 month ago

I noticed that KG2.10.1 includes three predicates that have a double 'biolink' prefix of sorts:

MATCH p=()-[e:`biolink:biolink_in_clinical_trials_for`|:`biolink:biolink_mentioned_in_trials_for`|:`biolink:biolink_treats`]->() RETURN distinct e.predicate, count(distinct e)
e.predicate count(distinct e)
"biolink:biolink_in_clinical_trials_for" 13459
"biolink:biolink_treats" 3558
"biolink:biolink_mentioned_in_trials_for" 14215

these appear to have come from the Clinical Trials KP ingest - not sure if the double biolink situation was already present in the version of their data we consumed, or something added during the KG2pre build process...

saramsey commented 1 month ago

OK, commit 77db2e6 should fix the issue.

Excerpt of the file clinicaltrialskg_tsv_to_kg_jsonl-edges.jsonl before the fix:

{"domain_range_exclusion": false, 
 "id": "CHEBI:10023---biolink:biolink_in_clinical_trials_for---None---None---None---HP:0012531---ClinicalTrialsKG:", 
 "negated": false, 
 "object": "HP:0012531", 
 "predicate": null, 
 "primary_knowledge_source": "ClinicalTrialsKG:", 
 "publications": [], 
 "publications_info": {}, 
 "qualified_object_aspect": null, 
 "qualified_object_direction": null, 
 "qualified_predicate": null, 
 "relation_label": "biolink:in_clinical_trials_for", 
 "source_predicate": "biolink:biolink_in_clinical_trials_for", 
 "subject": "CHEBI:10023", 
 "update_date": "2018-06-15"}

and after the fix:

{"domain_range_exclusion": false, "id": "CHEBI:10023---biolink:in_clinical_trials_for---None---None---None---HP:0012531---ClinicalTrialsKG:", 
 "negated": false, 
 "object": "HP:0012531", 
 "predicate": null, 
 "primary_knowledge_source": "ClinicalTrialsKG:", 
 "publications": [], 
 "publications_info": {}, 
 "qualified_object_aspect": null, 
 "qualified_object_direction": null, 
 "qualified_predicate": null, 
 "relation_label": "in_clinical_trials_for", 
 "source_predicate": "biolink:in_clinical_trials_for", 
 "subject": "CHEBI:10023", 
 "update_date": "2018-06-15"}