SpeciesFileGroup / INHS-Insect-Collection-Data-Curation

An accesible issue tracker for reporting issues or requests with respect to INHS data quality.
2 stars 0 forks source link

biological associations are reversed #44

Closed tmcelrath closed 4 months ago

tmcelrath commented 4 months ago

@tmcelrath : I think this was fixed, right Dmitry? @proceps : I am not sure. I remember, that there was some issues in the original FM database. Some of the relationships were flipped there. And there was no easy way to separate which are right, which are wrong. The best way, probably is to try to download the table with all relationships, and see how those are resolved.

tmcelrath commented 4 months ago

Okay: Mating with- changed to "mating with" and made reflexive and transitive.

To fix:

tmcelrath commented 4 months ago

@proceps @mjy1 - see annotations. I think Attendance, Host Plant, and Pollination can all be easily fixed programmatically right now.

tmcelrath commented 4 months ago

@proceps Looking at the data.

tmcelrath commented 4 months ago

RE Attendance: 1) I see 351 records in INHS-IC database for Attendance. 2) I do not see the "Coleoptera-Plant" ones. Can you provide a link? 3) We are transitioning (on suggestions of @mjy1) away from explicit assumptions of "positive-negative" biological interactions. Labels generally do not make these assumptions, and only demonstrate physical interactions (e.g. something on a flower might not be pollinating, but just visiting it for nectar). These suggestions are a movement towards that. 4) There is no OBO Relationship Ontology for "Attendance" - I would suggest maybe "trophically interacts with": http://purl.obolibrary.org/obo/RO_0002438 or "co-localizes with": http://purl.obolibrary.org/obo/RO_0002325. 5) Hymenoptera are the "attendants" and Hemiptera are "Attended by", correct? Anything that is the inverse of this should be flipped (most of the records I think). I think the correct ones are more recently edited?

RE Host plants/Pollination: 1) All SHOULD have collection object as SUBJECT. We don't track "this plant Collection Object was visited by this insect OTU" in the database. We should ONLY have "this insect Collection Object was localized to this plant OTU". Otherwise no data would be reported/exported via DWC Export. 2) Hence, both "Pollination" and "Host-plants" should be corrected to "Localized to" and the insect collection object should be the SUBJECT, never the OBJECT.

@mjy1 correct?

tmcelrath commented 4 months ago

@proceps For being more specific about relationships please use OBO relationship ontology as a reference https://www.ebi.ac.uk/ols4/ontologies/ro

tmcelrath commented 4 months ago

All non-deprecated Biological Relationships now have OBO RO definitions in definition: image

tmcelrath commented 4 months ago

Should "localized to" be changed to "adjacent to" in order to fit better with RO? http://purl.obolibrary.org/obo/RO_0002220

tmcelrath commented 4 months ago

@proceps Looking to attendance, could not find coleoptera. But still there is a problem. CO is always object, but in some cases it is Hemiptera, in some other cases it is Hymenoptera. How would you divide those?

tmcelrath commented 4 months ago

For "Attendance (=adjacent to)" - there are two groups currently. Both are reversed in different ways.

  1. Current: Hemiptera (subject) OTU "attendant to" Hymenoptera (object) Collection Object
  2. Current: Hymenoptera (subject) OTU "attendant to" Hemiptera (object) Collection Object
tmcelrath commented 4 months ago

@proceps from my experience both are valid relationships. Since the majority of the relationships coming from Hymenoptera side, I would suggest to keep one type of the relationship and always put hymenoptera (CO or OTU) on the subject side.

tmcelrath commented 4 months ago

@proceps There is visited_by relationship

tmcelrath commented 4 months ago

I like "visited by" but it lacks a definition. I'm having trouble coming up with a definition that can be explicitly differentiated from "adjacent to" without including assumptions of directional benefits or costs. My opinion is honestly to just fold into "localized to/adjacent to" here: http://purl.obolibrary.org/obo/RO_0002220 @mjy1 opinions?

tmcelrath commented 4 months ago

@proceps mutualistically interacts with? http://purl.obolibrary.org/obo/RO_0002442

tmcelrath commented 4 months ago
image
tmcelrath commented 4 months ago
image
tmcelrath commented 4 months ago

@mjy For the purposes of curation of an insect collection I would ensure that the subject is the collection object, this just makes curation easier if consistent. The subject should further be the one with the CatalogNumber if there are 2 COs and one doesn't have a CatalogNumber. If there are two COs both with CatalogNumber then it will be come tricker which to make the subject in reflexive relationships (like localized_to), I'd suggest some arbitrary decision, like alphabetical position of taxon determination name decides subject and object.

Regarding which relationships to use I think using generic ones like co-located, and localized to are going to be fine. One thing to remember, not sure how it impacts things, we're about to (well, this year sometime) introduce AnatomicalPart, and "Flower" can be an antomical part, so we can do this:

CO -> AnatomicalPart -> Some OTU

bee -> some flower -> some flowering plant

or

mite -> a thorax -> thorax of some CO

My only thinking is along the lines of whether we lose our ability to refactor to this granularity if we switch off of polinator now. For example we might see that labels include "flower" for some relationships that use polinator.

tmcelrath commented 4 months ago
image
tmcelrath commented 4 months ago

Manually fixed by @tmcelrath transferred by @mjy reversed by @proceps - closed.