RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
34 stars 9 forks source link

`validate_curies_to_urls_map_yaml.py` is validating CURIEs not URLs #320

Open ecwood opened 12 months ago

ecwood commented 12 months ago

Per @saramsey, validate_curies_to_urls_map_yaml.py is validating CURIEs only, not the URLs. In an enhancement, we should have the script compare the URLs to what is in Biolink.

Originally posted by @ecwood in https://github.com/RTXteam/RTX-KG2/issues/302#issuecomment-1634570220

ecwood commented 12 months ago

This should especially happen if a URL is in the use_to_bidirectional_mapping section.

This doesn't need to happen for use_for_contraction_only - there is no expectation that the URLs are Biolink standard. The reason URLs go in there is for situations where URL is not official URL. These URLs tend to be non-standard, but we get stuck with them when we inject sources.

For use_for_expansion_only, that is for non-standard prefixes coming in. This helps us deal with them without erroring out. Everyone of these URLs should be correct but the prefixes all have problems with them.

ecwood commented 12 months ago

From 6677636, we know that

WARNING: CURIE_URL NOT SAME AS BIOLINK URL for CHEMBL.MECHANISM  - KG2: https://www.ebi.ac.uk/chembl# , Biolink: https://www.ebi.ac.uk/chembl/mechanism/inspect/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for DGIdb  - KG2: https://www.dgidb.org/ , Biolink: https://www.dgidb.org/interaction_types
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for DrugCentral  - KG2: https://drugcentral.org/drugcard/ , Biolink: http://drugcentral.org/drugcard/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for EFO  - KG2: http://purl.bioontology.org/ontology/EFO/ , Biolink: http://www.ebi.ac.uk/efo/EFO_
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for HANCESTRO  - KG2: http://purl.obolibrary.org/obo/HANCESTRO_ , Biolink: http://www.ebi.ac.uk/ancestro/ancestro_
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for KEGG  - KG2: https://www.genome.jp/dbget-bin/www_bget?pathway:map , Biolink: http://www.kegg.jp/entry/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for medgen  - KG2: https://identifiers.org/medgen: , Biolink: https://www.ncbi.nlm.nih.gov/medgen/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for PathWhiz  - KG2: https://smpdb.ca/pathwhiz/pathways/ , Biolink: http://smpdb.ca/pathways/#
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for PomBase  - KG2: https://identifiers.org/pombase: , Biolink: https://www.pombase.org/gene/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for REPODB  - KG2: http://apps.chiragjpgroup.org/repoDB , Biolink: http://apps.chiragjpgroup.org/repoDB/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for UniProtKB  - KG2: https://identifiers.org/uniprot: , Biolink: http://purl.uniprot.org/uniprot/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for VANDF  - KG2: http://purl.bioontology.org/ontology/VANDF/ , Biolink: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/VANDF/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for WBls  - KG2: http://purl.obolibrary.org/obo/WBBL_ , Biolink: http://purl.obolibrary.org/obo/WBls_
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for WBbt  - KG2: http://purl.obolibrary.org/obo/WBBT_ , Biolink: http://purl.obolibrary.org/obo/WBbt_
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for WIKIDATA  - KG2: https://www.wikidata.org/wiki/ , Biolink: https://www.wikidata.org/entity/
WARNING: CURIE_URL NOT SAME AS BIOLINK URL for WIKIDATA_PROPERTY  - KG2: https://www.wikidata.org/wiki/Property: , Biolink: https://www.wikidata.org/prop/
ecwood commented 12 months ago

It took a lot of commits, but this seems to be working now.