Current pyobo includes annotations (in the sense of GO annotations, not OWL annotations) modeled as relationships (i.e S subClassOf R some O).
An example of this is ec.obo:
[Term]
id: eccode:1.1.1.1
name: alcohol dehydrogenase
is_a: eccode:1.1.1 ! With NAD(+) or NADP(+) as acceptor
relationship: RO:0002327 GO:0004022 ! enables alcohol dehydrogenase (NAD+) activity
relationship: RO:0002351 uniprot:A0A0H2URT2 ! has member ADHE_STRPN
relationship: RO:0002351 uniprot:A0A0H2ZM56 ! has member ADHE_STRP2
[many rows deleted]
This has a number of practical and semantic disadvantages
It bloats the size (ec.obo is 14x bigger with relationships)
Danger of ontological errors (real: the composed products will simply not work in OWL environments unless everything is modeled just so)
Lack of modularity / Harder to recompose into application-specific products (e.g. what if I want EC + just human proteins)
product becomes stale sooner
lack of separation of concerns
For associations it's important to have evidence, provenance. While this can be done with ontology formats using axiom annotation, this can get bulky and awkward. A TSV is simpler and better often
Directionality issues (are links to EC distributed with uniprot? links to uniprot distributed with EC? both?)
Shoreline issues (ec.obo includes all swissprot annotations, but not, say an arguably more useful set like reference proteomes for core species. Why?)
Instead decouple the associations / annotations / contingent knowledge. Use TSVs without OWL semantics and all its pitfalls. KGX is a good choice. Some associations are better modeled as SSSOM. By all means distribute these as .obo/.owl as well, and by all means distribute merged products too. The key is to focus on the "conceptual coat hanger" as Rector calls it, and allow people to hang their coats as they please.
This is less work for pyobo/obo-db-ingest overall. Sometimes you can simply say "we are only providing the coat rack today, we may get to the associations later"
Current pyobo includes annotations (in the sense of GO annotations, not OWL annotations) modeled as
relationship
s (i.eS subClassOf R some O
).An example of this is ec.obo:
This has a number of practical and semantic disadvantages
relationships
)Instead decouple the associations / annotations / contingent knowledge. Use TSVs without OWL semantics and all its pitfalls. KGX is a good choice. Some associations are better modeled as SSSOM. By all means distribute these as .obo/.owl as well, and by all means distribute merged products too. The key is to focus on the "conceptual coat hanger" as Rector calls it, and allow people to hang their coats as they please.
In practical terms something like this:
This is less work for pyobo/obo-db-ingest overall. Sometimes you can simply say "we are only providing the coat rack today, we may get to the associations later"