Closed cmungall closed 8 years ago
It looks very useful. What are the different columns exactly?
If it happens that these annotations are not trustworthy enough, we should generate them separately once in a while, and simply manually integrate them.
BUT, if they are trustworthy enough (and it looks like they are):
transformation_of
relations);=> everything would be performed automatically, and would be easily extendable.
Would it work for you? Any better idea?
Columns here: https://github.com/cmungall/anatomical-similarity-annotations/blob/master/scratch/README.md
but a bit cryptic. Basically it's using simple genus-differentia definitions. It should really report the taxon field, using the most conservative taxon for the intersection. But this is just to give the flavor - the first 2 columns are the important ones.
A separate inference (not included here) is using skeleton_of
.
If we're going lambda we may as well jump straight to scala (@balhoff's homology code is already scala)
OK, I gave it a shot: https://raw.githubusercontent.com/BgeeDB/anatomical-similarity-annotations/master/release/raw_similarity_annotations.tsv (search for annotations with evidence ECO:0000501)
Please tell me if I missed anything that your code could generate.
I think my code ignored NOTs
I don't think this is a valid inference:
HOM:0000007 historical homology UBERON:0000428 prostate epithelium NOT 40674 Mammalia CIO:0000005 low confidence from single evidence ECO:0000501 evidence used in automatic assertion Annotation inferred from logical constraints using annotations to same HOM ID and: entity: UBERON:0000483, negated: true, taxon ID: 33208 - entity: UBERON:0002367, negated: false, taxon ID: 40674 bgee
(as an aside, I think we need an evidence objects field, or WITH/FROM in GO terminology, that lists the entities used to make the inference)
I think the NOT here is coming from an assertion about epithelia not being homologous across metazoa
The criteria for NOT inference should be: at least one of the element pairs of the definition should stand in a NOT, but the species should match. Thus we can infer the prostate epithelium is not homologous across metazoa, but this is not a useful statement as sponges etc lack a prostate.
OK, this should be fixed, see 0ed8da5eae4b670f02aa45cfefb8c7e867078fb6. It should also solve a problem with inferences based on annotations with multiple Uberon IDs.
(BTW, have you seen the new "ancestral taxa" file in /release? It tries to capture the true ancestral taxa of each structure, I think this is what you really need)
Looks good, thanks.
I think I need to a gude to the new files...
Is my understanding correct:
I think ultimately I want some kind of combination of call these: summary annotations, but with the full provenance and original statements (I really like how you capture supporting text). This would be better suited to a json-ld or owl/rdf representation I may use this as a use case when I work with Marcus on the evidence ontology later this year.
Will add documentation soon.
For full provenance and original statements, we think that users should simply get back to the RAW annotations. This leads back to the question of an Object identifier. But, in any case, it is already possible to get back to original annotations. I even wrote some code to do that. Will release it along with documentation.
See: https://github.com/cmungall/anatomical-similarity-annotations/blob/master/scratch/inferred.tsv
Are these valid inferences? E.g. if skin is homologous, and limb is homologous to fin, then trivially skin of limb is homologous to skin of fin.
Are these useful? They are for us but not sure if this should be handled when we make the homology.owl file in uberon, or upstream in this repo. Happy either way, can easily make this in the source format.