EBISPOT / efo

Github repo for the Experimental Factor Ontology (EFO)
https://www.ebi.ac.uk/efo/
55 stars 13 forks source link

Curate spreadsheet/template linking biomarkers to disease #1087

Open dosumis opened 3 years ago

dosumis commented 3 years ago

Terms to target:

column content description
measurement_ID ID of a measurement term. ID should take the form EFO:nnnnnnn
measurement_label label of a measurement term
biomarker_for_label what the measurement is a biomarker for (label). Typically a disease (MONDO)
biomarker_for_ID what the measurement is a biomarker for (ID). ID should take the form MONDO:nnnnnnn
evidence_comment evidence/description of use as biomarker - free text
supporting publications supporting publications PMID:nnnnn or DOI:nnn... Delimit multiple using a \|

These can be curated into a google spreadsheet or Excel, but should then be copied into a TSV file on this repository.

The aim of this spreadsheet is to generate axioms linking measurement to the diseases (etc) for which they are biomarkers and to use these to automate classification under biomarker grouping classes. Schema TBD.

dosumis commented 3 years ago

Related ticket: https://github.com/EBISPOT/efo/issues/787 + tickets linked to it via ZenHub.

@paolaroncaglia @zoependlington comments/context on prior work on this would be most welcome. We have taken this table-based curation approach for now in order to try to be as neutral possible about schema. It will be straightforward to use this as a template for axiom generation -> EFO.

paolaroncaglia commented 3 years ago

Hi @dosumis and @kallia-p , I wasn't involved in GWAS EFO work, and only created a few measurements terms requested from non-GWAS users. Zoë and I tagged and linked relevant tickets, and tried to collate everything in an Epic, as you noted above, but I'm afraid I can't offer much context on prior work as that was carried out by Dani and then Trish afaik. I looked among my emails as I vaguely remember that Sandra Machlitt-Northen (who used to be at Open Targets on campus on secondment from GSK) might have provided some thoughts in the past, but I couldn't find records. As far as I remember, there weren't resources to follow up. Have a good weekend, Paola

dosumis commented 3 years ago

These could potentially be mined using simple SPARQL queries

This may give incomplete results with SPARQL - even with all the pre-reasoning in the ubergraph database. Via DL query: 'is about' some (has_role some biomarker)

dosumis commented 3 years ago

Working query to find all subclasses of existing biomarker terms:

https://api.triplydb.com/s/SEHYRH18_

Finds quite a lot - 527 lines returned.

Might be useful to add a clause that returns definition text too.

kallia-p commented 3 years ago

@dosumis Working SPARQL queries: Working query which gets biomarker class labels, subclasses of biomarker classes (transitive), labels for subclassOf, definition for biomarker subclasses, definition citations and dbxrefs (PMIDs) https://api.triplydb.com/s/V0qlPxoBo Resulting table https://docs.google.com/spreadsheets/d/1hHWHai_IeKKTrPyaCpck3Jv_4yXDvnBl8YIyDKIFdeU/edit?usp=sharing (should be accessible to all EBI users - let me know if not!)

Working query as above without definition citations and dbxrefs (PMIDs) https://api.triplydb.com/s/lkDBw-qV8

Practice queries https://api.triplydb.com/s/5muIyh22W https://yasgui.triply.cc/#query=prefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2[…]2Fsparql-results%2Bjson%2C*%2F*%3Bq%3D0.9&outputFormat=table