edmcouncil / idmp

This repository stores the OWL ontology built on the basis of the ISO standards for identification of medicinal products.
https://spec.edmcouncil.org/idmp/
MIT License
30 stars 10 forks source link

IDMP-523 - Challenges with the representation of a list for defining protein sequences #353

Open ElisaKendall opened 1 year ago

ElisaKendall commented 1 year ago

What is the correct relation between the protein sequence and the amino acide residues?

AlaLeuGlu a idmp-sub:ProteinSequence . Ala/Leu/Glu a idmp-sub:AminoAcidResidue .

  1. AlaLeuGlu cmns-coll:hasConstituent Ala, Leu, Glu .
  2. AlaLeuGlu cmns-col:hasMember Ala, Leu,Glu .

It can’t be both due to the disjointness between hasConstituent and hasMember.

The protein (substance) is defined by its protein sequence. The protein specifies physical proteins which is matter composed of gazillions (about 1E23/g) of scattered protein molecules. The al,leu or glu in the Dayhoff notation designates also a structure which defines a molecular group that is part of the protein molecule. All can be described in the structural representation: X-NH-R-(C=0)-Y with X,Y being wildcards (X for an hydrogen atom or the chain, and Y for OH-group or the further chain) and R a wildcard defining the type of amino acid.

The first seems more wrong, if we can say so: It is the constituency of the structure that has the amino acid as constituent:

AlaLeuGlu a idmp-sub:StructuralConstituency, cmns-col:Constituency . AlaLeuGlu-Constituency cmns-dsg:defines AlaLeuGlu . AlaLeuGlu-Constituency cmns-col:hasConstituent Ala,Leu,Glu .

However AlaLeuGlu is also a cmns-strcol:List therefore: AlaLeuGlu cmns-col:hasMember Ala,Leu,Glu. (by being a cmns-col:Collection) and AlaLeuGlu cmns-col:hasConstituent AlaC, LeuC, GluC (by being a cmns-strcol:List) AlaC is an cmns-strcol:IndexedConstituent.

By the disjoint property condition AlaC and Ala must be disjoint individuals. Instead it is AlaLeuGlu cmns-col:hasConstituent AlaC . AlaC ?x Ala .

The predicate ?x links the constituent to the thing that plays the constituent role. What is x? Is it cmns-pts:playsRole?

I have understood you right then we should not use such a predicate x, but instead make the individual both of type Constituent and the class that plays the role (AminoAcidResidue), but that is no longer allowed because of the disjointness.

This problem occurs in all cases where the whole is a list. cmns-strcol:List subclass of (cmns-col:hasConstituent only cmns-strcol:IndexedConstituent) has to go and must be replaced with

cmns-strcol:List subclass of (cmsn-col:hasMember only cmns-strcol:IndexedConstituent)

Now the element can be both a constituent and the element type. If we want to use cmns-col:hasConstituent from the list, we have to put the cmns-col:Constituency in between.

I certainly like to use cmns-strcol:hasFirst resp. cmns-strcol:hasLast to define the N- and C-Terminals, and if that predicates can be defined based on IndexedConstituent, then it is awkward to write: aList a List. aList cmns-col:isDefinedIn aListConstituency. aListConstituency cmns-pts:hasConstituent aListConstituent. aListConstituent cmns-pts:isPlayedBy aListElement . aListConstituent cmns-strcol:hasIndexValue aListElementIndexValue . aListElementIndexValue cmns-qtu:hasNumericValue 1 . (cmns-x:hasOrdinalValue??)

instead of the more simpler aList a List. aList cmns-col:hasMember aListElement. aListElement cmns-col:hasIndexValue [ cmns-x:hasNumericValue 1 ] . # or even simpler: aListElement cmns-col:hasOrdinalIndexValue 1

In my example: AlaLeuGlu a ProteinSequence. AlaLeuGlu cmns-col:hasMember Ala,Leu, Glu. Ala cmns-x:hasOrdinalIndexValue 1 Leu cmns-x:hasOrdinalIndexValue 2 Glu cmns-x:hasOrdinalIndexValue 3

making 7 statements

Instead of AlaLeuGlu a ProteinSequence . AlaLeuGlu cmns-col:hasMember Ala,Leu, Glu. AlaLeuGlu cmns-dsg:isDefinedIn AlaLeuGlu-C. AlaLueGlu-C cmns-pts:hasConstituent Ala-C, Leu-C, Glu-C . Ala-C cmns-pts:isPlayedBy Ala . Ala-C cmns-strcol:hasIndexValue [ cmns-qtu:hasNumericValue 1 ] Leu-C cmns-pts:isPlayedBy Leu . Leu-C cmns-strcol:hasIndexValue [ cmns-qtu:hasNumericValue 1 ] Glu-C cmns-pts:isPlayedBy Glu . Glu-C cmns-strcol:hasIndexValue [ cmns-qtu:hasNumericValue 1 ]

with 17 statements.

To define a 3-peptide. In this case, nobody will use IDMP-O, and instead use the Dayhoff string encoding only “ala-leu-glu”.

IMHO the disjoint property axiom should be dropped and the usage guidelines in cmns-col be clarified, but that will probably not happen.

So we have to make IndexedConstituent not a subclass of Constituent but of Member With Member equivalent (cmns-col:isMemberOf cmns-col:Collection) And then we would have to rename the class as well to avoid confusion.

Or We must replace cmns-col:List subclass of (cmns-col:hasConstituent only cmns-strcol:IndexedConstituent) with cmns-col:List subclass of (cmns-dsg:isDefinedIn some (cmns-col:Constituency and cmns-col:hasConstituent only cmns-strcol:IndexedContituent))

mereolog commented 7 months ago

@ElisaKendall is this still an issue? The ticket is almost 10 months old - can we close it?

ElisaKendall commented 7 months ago

@mereolog To simplify things we are not using the pattern that @tw-osthus describes above at the moment, but that doesn't mean that we won't want to address this later this year. I think it needs to stay open.