The-Sequence-Ontology / MSO

Molecular Sequence Ontology
9 stars 5 forks source link

DNA extent OWL definition #6

Open cmungall opened 5 years ago

cmungall commented 5 years ago

I have some questions about this axiom:

'DNA extent' EquivalentTo
 'sequence molecular entity extent' and ('has part' only 
('deoxyribonucleotide residue' or (('chemical entity' or 'biological sequence entity') and (not ('biological sequence unit')))))
cmungall commented 5 years ago

the has-part-only issue is more apparent on this axiom:

'genomic DNA extent' SubClassOf
'has part' only 
('genomic DNA extent' or (group and (not ('sequence molecular entity extent'))))

This is a fun one because of the recursivity. But the problem should be apparent. If chebi was to add a perfectly valid group subClassOf has-part some atom, and atom DisjointWith group, you entail that 'genomic DNA extent's are atoms...

matentzn commented 5 years ago

The way the 'DNA extent' is currently defined, the following two classes would be inferred as subclasses/instances of it:

 'sequence molecular entity extent' and not(has part some Thing)

(Only merely says there should not be any relations that do not confirm to the range of the only expression - so if there are non at all, the condition is fulfilled)

'has_part' only 'metal atom' 

would be a subclass of DNA extent

(assuming metal atom and 'biological sequence unit' are disjoint)

Are these two implications intended?

mikebada commented 5 years ago

@cmungall I couldn't come up with a way to formally define 'sequence molecular entity extent' (which is a continuous string of biological sequence units, either as a whole molecular entity or as a subsequence), but I wanted to formally define the extent subtypes as extents composed of specific types of sequence units, which is what I think this does. For 'DNA extent', I essentially wanted to say that it's a SMEE whose sequence units are (exclusively) deoxyribonucleotide residues. I agree that using transitivity and 'only' is usually problematic since parthood propagates all the way down, as you note. I've taken this into account by saying that the only parts of DNA extents are either deoxyribonucleotide residues, or they're chemical entities or biological sequence entities (the two main top-level classes of ChEBI and MSO, respectively) that are not biological sequence units. Thus, this definition still allows for parts of DNA extents that aren't deoxyribonucleotide units (e.g., other extents or regions, chemical groups, atoms, electrons, quarks, etc.). The only restriction is that the parts have to be either chemical entities or biological sequence entities, which doesn't seem unreasonable: ChEBI even already includes atoms and subatomic particles, so I think that, e.g., spaces between atoms would still be within its domain even if they're not explicitly represented now. Additionally, the MSO already has immaterial entities in the form of boundaries of sequence residues, specifically, junctions and termini, for things like chromosomal breakpoints and deletions. If we really had to, we could expand the union to include, e.g., BFO sites or whatever, but I'd say that's currently a nonexistent problem.

As for 'genomic DNA extent', it has a similar format to that of 'DNA extent', except that it uses 'group' instead of ('chemical entity' or 'biological sequence entity') as in 'DNA extent'. I was previously using 'group' in the object of the 'has part' expression, but later expanded it to ('chemical entity' or 'biological sequence entity'); I just hadn't updated the axioms for 'genomic DNA extent' yet. However, even with 'group', I don't see how genomic DNA extents would be classified as atoms with your presented axioms...

As to the reasons for the relatively complicated axiomatization, I'd first say that it's pretty close to the semantics I was trying to get; e.g., for 'DNA extent', that it's a SMEE composed of deoxyribonucleotide units. (The natural-language definition perhaps needs to be edited to match better.) However, it was also done for practical inferential reasons: With this axiomatization, along with others I've recently added, the ontology now knows how to properly connect the various types of molecular entities, extents, regions, and residues. For example, it knows that extents of DNA molecules have to be DNA extents, that regions of DNA molecules have to be DNA regions, and that residues of DNA molecules have to be deoxyribonucleotide residues (plus, using the inverse of 'has part', the reverse assertions are inferred as well). This reflects what we know, and results in some really useful inference, I think. For example, 'cDNA region' is defined only as a 'sequence molecular entity region' that's part of some cDNA; however, now that the ontology knows that any region of a DNA must be a DNA region, it can classify 'cDNA region' under 'DNA region', which it couldn't do before all of this axiomatization, so I think that's pretty cool.

mikebada commented 5 years ago

@matentzn 'DNA extent' is also a subclass of

'has part' some 'deoxyribonucleotide residue'

so with that I believe your presented classes wouldn't be classified as DNA extents. (Additionally, it currently doesn't, but its parent 'sequence molecular entity extent' should correspondingly be a subclass of 'has part' some 'biological sequence unit'.)

I'm not claiming that the definitions under discussion are totally immune from ill inferential effects, but I'd be interested in examining inferential issues you can think of regarding these definitions when combined with other reasonable (no pun intended) assertions.

mikebada commented 5 years ago

@cmungall @matentzn One issue of which I'm aware is that these definitions still lead to the classification of SMEEs that have inappropriate types of chemical entities or biological sequence entities as parts. For example,

'sequence molecular entity extent' and 
'has part' some 'deoxyribonucleotide residue' and 
'has part' some CHEBI:solution

(which is obviously nonsensical) would still be classified as a DNA extent. I'm still thinking of how I can further refine these to avoid this...

cmungall commented 5 years ago

To see the problem with genomic DNA extent:

Prefix: : <http://x.org/>

Ontology: <http://x.org>

ObjectProperty: has_part Characteristics: Transitive

## CHEBI        
Class: atom
Class: group
    SubClassOf: has_part some atom
    DisjointWith: atom

## MSO        
Class: sequence_molecular_entity_extent
Class: genomic_DNA_extent
    SubClassOf: sequence_molecular_entity_extent
    DisjointWith: atom
    DisjointWith: group
    SubClassOf: has_part some owl:Thing
    SubClassOf:
        has_part only (genomic_DNA_extent or (group and (not (sequence_molecular_entity_extent))))

Individual: p1
    Types: group
Individual: gde1
    Types: genomic_DNA_extent
    Facts: has_part p1

image

This injects an abox of a genomic extent with one group to demonstrate the inconsistency.

Alternatively you could load just the tbox and do a DL query:

image

presumably this is not the intent

cmungall commented 5 years ago

I'm still thinking of how I can further refine these to avoid this...

I'd recommend not refining further - owl definitions have to be understood by humans as well as machines.

What about a simple EL pattern using has_member? Treat extents as mereological sums of like units. $x extent = extent and has_member some $x. I think you'd get the same inferences you get re cDNA regions.

You would get less constraints off the bat, so if that's a requirement there may be a way to reintroduce these as disjointness GCIs or hidden GCIs

mikebada commented 5 years ago

But sequence molecular entity extents aren't disjoint with CHEBI groups; in fact, 'sequence molecular entity region', which is a child of 'sequence molecular entity extent', is explicitly asserted to be a subclass of 'group'. Would there still be a problem if the 'genomic DNA extent'/'group' disjointness axiom were removed?

cmungall commented 5 years ago

how about:

Prefix: : <http://x.org/>

Ontology: <http://x.org>

ObjectProperty: has_part Characteristics: Transitive

## CHEBI        
Class: atom
Class: group
    SubClassOf: has_part some atom
    DisjointWith: atom

## MSO        
Class: sequence_molecular_entity_extent
Class: genomic_DNA_extent
    SubClassOf: sequence_molecular_entity_extent
    DisjointWith: atom
    SubClassOf: has_part some owl:Thing
    SubClassOf:
        has_part only (genomic_DNA_extent or (group and (not (sequence_molecular_entity_extent))))

Individual: a1
    Types: atom
Individual: p1
    Facts: has_part a1        
Individual: gde1
    Types: genomic_DNA_extent
    Facts: has_part p1

image

mikebada commented 5 years ago

@cmungall But I noted that I haven't yet expanded the 'group' conjunct to the wider ('chemical entity' or 'biological sequence entity'), as I've done for the other definitions. I think that fixes it, right?

That being said, these definitions are problematic at least for the issue I noted above. The only other way I can currently think of to get the inference I'm seeking is to use specialized 'has part'/'part of' subrelations to refer to specific types of parts, e.g., 'has residue part'/'residue part of'. Is this strategy of defining and using specific partonomic relations considered OBO-kosher? It seems that these would be subrelations of 'has component'/'component of', right? (I think using the latter are problematic in that they seem to require human interpretation as to which components they're referring to.)

mikebada commented 5 years ago

@cmungall After toying around some, I think that the aforementioned types of inference might be possible using disjointness axioms instead, which you previously mentioned, e.g.:

nucleotide_extent
     disjointWith: has_part some (biological_sequence_unit and not nucleotide_residue)

What do you think?