FAIRsharing / domain-ontology

A project supporting the DRAO application ontology, a hierarchy of specific research domains and descriptors which imports subsets of terms from over 50 publicly-available ontologies.
Other
1 stars 1 forks source link

Refactoring sequence and its hierarchy #60

Closed allysonlister closed 3 years ago

allysonlister commented 3 years ago

What are you changing?

Sequence and its children are currently EDAM terms within the Data hierarchy as shown here: sequence

We plan to refactor these classes and move them to a different place in the hierarchy. Details of each mapping are in the Mapping section below. SO region is already present in DRAO, but it is not visible to FAIRsharing users as it does not have the inSubset="FAIRsharing" flag set. Here is the current view of that portion of DRAO: soseqfeat

So we will need to:

The overall hierarchy will be as follows, with detailed explanations of why each position/IRI was chosen further down.

Why are you suggesting this change?

As you can see, the _sequencefeature hierarchy is much more detailed than what we have for EDAM Sequence, and so it makes ontological sense to align the outlier EDAM Sequence with the existing SO hierarchy, and pull everything together so that the terms are grouped nicely for searching FAIRsharing.

This ticket is also related to #61, which describes refactoring variant to align with the SO hierarchy already in place.

Sequence / region

Mapping

Old IRI New IRI
http://edamontology.org/data_2044 (Sequence) http://purl.obolibrary.org/obo/SO_0000001 (region)

Label

Recommendation: Sequence, and delete region label

Reasoning: This is the label currently used in DRAO. Please note region is the name SO uses for sequence (indeed, sequence is one of its synonyms). However, as DRAO is a more generic AO, we cannot make use of region as it could imply geographic region, astronomical region or any other kind of region. Therefore the region label should be removed upon creation of the release files by adding http://purl.obolibrary.org/obo/SO_0000001 to filter-labels.txt

IRI

Recommendation: http://purl.obolibrary.org/obo/SO_0000001 Reasoning: Because we already use SO for most of our sequence-related terms.

Definition

Happy with SO definition.

Hierarchy

Recommendation: Retain existing hierarchy

polypeptide_region / Protein sequence / Amino acid sequence

Mapping

Old IRI New IRI
http://edamontology.org/data_2976 (Protein sequence) http://purl.obolibrary.org/obo/SO_0000839 (polypeptide_region)

Label

Recommendation: Amino acid sequence Reasoning: This is the label currently used in DRAO.

Please note we will retain the EDAM label Protein sequence via DRAO-manual.owl so that users will continue to be able to get to this term via that string.

IRI

Recommendation: http://purl.obolibrary.org/obo/SO_0000839 Reasoning: Because we already use SO for most of our sequence-related terms, and we already have this class in DRAO, just invisible to FAIRsharing, so it is the simplest way to refactor the EDAM class.

Definition

Happy with SO definition, though it is a little formal.

Hierarchy

Recommendation: Retain existing hierarchy

Nucleic Acid Sequence, DNA sequence, and RNA sequence

Mapping

Retain existing IRIs, definitions and labels.

Hierarchy

Recommendation: Place Nucleic Acid Sequence as a child of _biologicalregion

Reasoning: There are no exact matches to these three terms in SO, and they are very useful within FAIRsharing so we should retain them, but in the proper sequence hierarchy that we have imported from SO.

allysonlister commented 3 years ago

Please could @delphinedauga and @Drosophilic let me know if they are happy with these modifications. Thanks!