Inference of homology based on logical definitions

BgeeDB / anatomical-similarity-annotations

Project hosting resources used for annotating relations of similarity between anatomical structures

Creative Commons Zero v1.0 Universal

2 stars 2 forks source link

Inference of homology based on logical definitions #7

Closed cmungall closed 8 years ago

cmungall commented 9 years ago

See: https://github.com/cmungall/anatomical-similarity-annotations/blob/master/scratch/inferred.tsv

Are these valid inferences? E.g. if skin is homologous, and limb is homologous to fin, then trivially skin of limb is homologous to skin of fin.

Are these useful? They are for us but not sure if this should be handled when we make the homology.owl file in uberon, or upstream in this repo. Happy either way, can easily make this in the source format.

fbastian commented 9 years ago

It looks very useful. What are the different columns exactly?

If it happens that these annotations are not trustworthy enough, we should generate them separately once in a while, and simply manually integrate them.

BUT, if they are trustworthy enough (and it looks like they are):

I could put here the source code generating the files, made independent from the Bgee pipeline;
there is already an "inference" part (for transformation_of relations);
we could simply define a common Java interface, for classes making homology inferences; we could easily plug my inferences and yours, written independently; (we could use Java 8 with a functional interface and lambda expressions, haha :p)

=> everything would be performed automatically, and would be easily extendable.

Would it work for you? Any better idea?

cmungall commented 9 years ago

Columns here: https://github.com/cmungall/anatomical-similarity-annotations/blob/master/scratch/README.md

but a bit cryptic. Basically it's using simple genus-differentia definitions. It should really report the taxon field, using the most conservative taxon for the intersection. But this is just to give the flavor - the first 2 columns are the important ones.

A separate inference (not included here) is using skeleton_of.

If we're going lambda we may as well jump straight to scala (@balhoff's homology code is already scala)

fbastian commented 9 years ago

OK, I gave it a shot: https://raw.githubusercontent.com/BgeeDB/anatomical-similarity-annotations/master/release/raw_similarity_annotations.tsv (search for annotations with evidence ECO:0000501)

Please tell me if I missed anything that your code could generate.

cmungall commented 9 years ago

I think my code ignored NOTs

I don't think this is a valid inference:

HOM:0000007 historical homology UBERON:0000428 prostate epithelium NOT 40674 Mammalia CIO:0000005 low confidence from single evidence ECO:0000501 evidence used in automatic assertion Annotation inferred from logical constraints using annotations to same HOM ID and: entity: UBERON:0000483, negated: true, taxon ID: 33208 - entity: UBERON:0002367, negated: false, taxon ID: 40674 bgee

(as an aside, I think we need an evidence objects field, or WITH/FROM in GO terminology, that lists the entities used to make the inference)

I think the NOT here is coming from an assertion about epithelia not being homologous across metazoa

The criteria for NOT inference should be: at least one of the element pairs of the definition should stand in a NOT, but the species should match. Thus we can infer the prostate epithelium is not homologous across metazoa, but this is not a useful statement as sponges etc lack a prostate.

fbastian commented 9 years ago

OK, this should be fixed, see 0ed8da5eae4b670f02aa45cfefb8c7e867078fb6. It should also solve a problem with inferences based on annotations with multiple Uberon IDs.

(BTW, have you seen the new "ancestral taxa" file in /release? It tries to capture the true ancestral taxa of each structure, I think this is what you really need)

cmungall commented 9 years ago

Looks good, thanks.

I think I need to a gude to the new files...

cmungall commented 9 years ago

Is my understanding correct:

raw_similarity_annotations should be equivalent to all lines of type RAW from similarity.tsv, plus the line type column removed?
it's not currently a subset as the experimental inferences have been added to the raw file, but not the main one
ancestral should be equivalent to the SUMMARY lines from similarity.tsv, but the columns change to reflect the fact this is a synthesis

I think ultimately I want some kind of combination of call these: summary annotations, but with the full provenance and original statements (I really like how you capture supporting text). This would be better suited to a json-ld or owl/rdf representation I may use this as a use case when I work with Marcus on the evidence ontology later this year.

fbastian commented 9 years ago

Will add documentation soon.

For full provenance and original statements, we think that users should simply get back to the RAW annotations. This leads back to the question of an Object identifier. But, in any case, it is already possible to get back to original annotations. I even wrote some code to do that. Will release it along with documentation.