INCATools / ontology-access-kit

Ontology Access Kit: A python library and command line application for working with ontologies
https://incatools.github.io/ontology-access-kit/
Apache License 2.0
120 stars 29 forks source link

Duplicated ClassCreation changes in CSV diff output #778

Closed gouttegd closed 4 months ago

gouttegd commented 4 months ago

Trying to generate a CSV-formatted diff with a command like:

runoak -i simpleobo:ont1.obo diff -X simpleobo:ont2.obo --output-type csv --output diff.csv

yields a CSV file in which every ClassCreation-type changes are seemingly duplicated, as in this excerpt obtained from comparing http://purl.obolibrary.org/obo/cl/releases/2023-05-22/cl-base.obo and http://purl.obolibrary.org/obo/cl/releases/2024-04-15/cl-base.obo (here isolating all changes pertaining to CL:4023035):

id,type,subject,predicate,object,about_node,in_subset,old_value,new_value,name,has_direct_replacement,has_nondirect_replacement,associated_change_set
uuid:dad7e133-976c-40a5-98d0-a665312a7db4,ClassCreation,,,,CL:4023035,,,,,,,
uuid:e9597028-9dbf-4b6a-968d-2d27dcc6347b,NewSynonym,,oio:hasExactSynonym,,CL:4023035,,,LGE-derived neuron,,,,
uuid:9f1b5a26-79f7-428f-9e14-21c73cdb2283,EdgeCreation,CL:4023035,rdfs:subClassOf,BFO:0000002,,,,,,,,
uuid:9ac854a4-807f-4621-8412-468fa13d06d6,EdgeCreation,CL:4023035,rdfs:subClassOf,CL:0000540,,,,,,,,
uuid:31dcfb7d-0998-41c3-8c5a-de2a71df27c7,EdgeCreation,CL:4023035,RO:0002202,UBERON:0004025,,,,,,,,
uuid:161d31ff-094c-49cd-85a0-4c82044ab2e1,NewTextDefinition,,,,CL:4023035,,,A neuron that is derived from a precursor cell in the lateral ganglion eminence.,,,,
uuid:48e9e429-963c-4e67-8883-a3098b96e8e8,ClassCreation,,,,CL:4023035,,,,lateral ganglionic eminence derived neuron,,,

Notice the two ClassCreation changes (uuid:dad7e133-976c-40a5-98d0-a665312a7db4 and uuid:48e9e429-963c-4e67-8883-a3098b96e8e8), which differs in that the latter is associated with a name (“lateral ganglionic eminence derived neuron”).

I believe either the second ClassCreation change should be a NodeRename (“renaming” from nothing to the actual label) or the name should already be associated with the first ClassCreation change and the second change should be pruned.

Possibly related to #732 (but observed with OAK 0.6.10, which is supposed to have fixed that issue), though I have not looked at the underlying code.

gouttegd commented 4 months ago

Note that it affects the YAML (and presumably the JSON) output as well:

---
id: uuid:ce46aef9-b1d4-47f1-8a59-4768fcb3974d
type: ClassCreation
about_node: CL:4023035

---
id: uuid:44669f60-4385-478e-aac7-e71254c8e80e
type: NewTextDefinition
new_value: A neuron that is derived from a precursor cell in the lateral ganglion
  eminence.
about_node: CL:4023035

---
id: uuid:1c793c0b-b15c-4fe8-b682-db22bd206347
type: EdgeCreation
subject: CL:4023035
predicate: RO:0002202
object: UBERON:0004025

---
id: uuid:73860bfd-d3d2-4907-ae25-ed483316bb4d
type: EdgeCreation
subject: CL:4023035
predicate: rdfs:subClassOf
object: BFO:0000002

---
id: uuid:fc20817e-87ef-4718-b9cb-58dea0758535
type: EdgeCreation
subject: CL:4023035
predicate: rdfs:subClassOf
object: CL:0000540

---
id: uuid:8839f49f-a66f-4993-86f7-ceae127db17a
type: NewSynonym
new_value: LGE-derived neuron
about_node: CL:4023035
predicate: oio:hasExactSynonym

---
id: uuid:67b5ce02-2fa9-4358-b73d-cd610f870511
type: ClassCreation
about_node: CL:4023035
name: lateral ganglionic eminence derived neuron
gouttegd commented 4 months ago

As for the KGCL output, there are no duplicated class creation changes, but there are no class creation changes at all. Instead, there are create None instructions:

create None
create edge CL:4023035 rdfs:subClassOf BFO:0000002
create edge CL:4023035 rdfs:subClassOf CL:0000540
create edge CL:4023035 RO:0002202 UBERON:0004025
create None
add definition 'A neuron that is derived from a precursor cell in the lateral ganglion eminence.' to CL:4023035
create synonym 'LGE-derived neuron' for CL:4023035
cmungall commented 4 months ago

@hrshdhgd let's fix on our 1-on-1

hrshdhgd commented 4 months ago

@gouttegd : v0.6.11 should fix this issue. Thanks for letting us know!

gouttegd commented 4 months ago

Thanks for the quick fix!