GMOD / Chado

the GMOD database schema
http://gmod.org/wiki/Chado
Artistic License 2.0
38 stars 25 forks source link

Term/typedef in feature_property.obo #73

Open puethe opened 6 years ago

puethe commented 6 years ago

Hi,

I'm working on setting up a new Chado database from scratch with minimum amount of hardcoding. feature_property.obo definitely is a great help for that.

However, I noticed that all entries are in 'typedef' stanzas. From other ontologies (e.g. GO), I learned that 'typedef' stanzas are only used for relationships (and thus I store those in Chado with flag is_relationshiptype=1). This is also how I understood it from the OBO format guide.

Certain entries (such as 'linked_to') certainly qualify as relationships, whereas others (such as 'aminoacid') certainly don't. Would there be a way to reflect this in the stanza type? This would be a great help for me.

Thanks, Christoph

P.S. one more minor thing: The header contains two 'default-namespace' lines, one of which has the value 'unknown'. I assume we can do without that - might prevent ambiguities, depending on employed obo parser.

scottcain commented 5 years ago

Hi @puethe the That obo file is fairly old. It wouldn't surprise me if the format/standard has changed out from under it (of course, alternatively, the author (who very well could have been me, I don't remember) didn't understand fully what he or she was doing.

I am reasonably sure that making a change fixing the format would not break things for other users, since, as you say, I don't think anybody is using aminoacid as a relationship :-) If you want to submit a pull request, that would be fine. If not, I'll tag this issue for next month's meeting where will be working on several issues.

cmungall commented 5 years ago

the idea is to use the range constraint; for example

[Typedef]
id: SOFP:aminoacid
name: aminoacid
namespace: feature_property
def: "amino acid coded for by a tRNA transcript feature" [so:cjm]
is_a: SOFP:feature_property
domain: SO:0000253
range: CHEBI:22477

This relates a sequence feature to a particular branch in CHEBI. You can think of feature_relationship formally as an edge between two sequencefeature features, so an approximation of a check is to see if both domain and range are SO IDs (the precise check is to see if the domain and range are both subtypes of SO sequence_feature)

puethe commented 5 years ago

Okay, thanks for the explanation. I think that we can parse this accordingly. @scottcain in this case I won't submit a PR, but many thanks for your reply!