Open caufieldjh opened 1 year ago
A clue - the CURIEs to be updated are all wikidata URLs and should get the prefix WIKIDATA:
, but they get the prefix WD_Entity:
instead. Bioregistry knows about that alternate prefix but it isn't in the imported maps.
The post-processing fails because KG-OBO finds prefixes it wants to rewrite, writes them to the update_id_maps.tsv
, but then finds that the nodefile doesn't contain any of those nodes since they have been converted to WD_Entity:
already.
This is a conversion kgx
is doing - transforming the obojson version also yields WD_Entity
nodes:
kgx transform -i obojson -f tsv -o agro_test agro.json
This is true for both kgx
1.5.9 and 1.7.0.
So kgx
is probably using the prefixcommons Monarch map:
https://github.com/prefixcommons/prefixcommons-py/blob/master/prefixcommons/registry/monarch_context.jsonld#L151
Essentially we need to deactivate the prefix maps handled by the kgx
prefix manager (https://kgx.readthedocs.io/en/latest/reference/prefix_manager.html).
The transform of envo
has a nearly identical issue.
Same with gaz
.
xco
has a potentially related issue, though with MESH.
The big workaround here is to just be less stringent about incomplete mappings. Right now, if we attempt to remap 2 nodes and 2 fail, we consider the whole transform failed, but if just 1 fails, it clears. The priority should be on having a transform; there may be 10000x as many perfectly prefixed nodes in there.
Describe the bug
The
agro
transform appears to go as expected, until it hits post-processing:To Reproduce
Expected behavior
Post-processing for this OBO should update 4 CURIEs and write out the updated nodes file.
Version
efc2324f040d8daad14ffaaaa6e71583d6258117