biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
172 stars 71 forks source link

New Predicates to support CTD Gene-Chemical data #121

Closed mbrush closed 3 years ago

mbrush commented 6 years ago

Chemical-Gene associations are an important data type that is currently missing/underrepresented across the Translator ecosystem of knowledge sources and services. CTD provides a dataset with over 1 million chemical-gene associations curated from the literature. All CTD chemical-gene interaction types in this dataset are shown here. There are 159 'predicates' possible given the 53 base interaction types x 3 directions (affects, increases, decreases).

Gamma currently has a simple representation of the CTD Chemical-Gene dataset using smartBags. This data does not use BLM-compliant predicates, omits aspects of qualifying context, and only includes the simple/atomic interactions (i.e. omits nested interactions such as "Cadmium inhibits the reaction [Magnesium results in increased activity of ABCB1 protein]"). We plan to enhance this representation by providing specific BLM-based predicates describing the chemical-gene interaction type, and qualifiers to add additional specificity and context for these associations. For this we will need to create new predicates in the Biolink model to accommodate these interaction types.

We propose creating 20 predicates and mapping the 159 in CTD data to these, as proposed in the mapping spreadsheet here (this could expand to 60 if we include predicates for dec/inc for each interaction type, in addition to affects). In addition to these predicates, we will capture additional context for these associations using qualifier-based modeling patterns also proposed in the spreadsheet, These will capture things like more specific molecular modification type, the 'form' of the gene affected, and the species in which interaction was observed).

Once the proposals are vetted and final, @mbrush will add the new predicates to the Biolink model, and @cbizon will use these to create an improved version of the gamma CTD smartBag data. The goal is to have an API serving this data ready for the Portland Hackathon.

mbrush commented 6 years ago

Longer term plans/improvements:

mbrush commented 6 years ago

Summarizing initial decisions form the 9-9-18 KGS call:

  1. We will implement the predicates as listed in the spreadsheet, save a few exceptions where further diligence is required to understand what is being asserted as true in the data:

    • 'affects binding of' is not appropriate if these associations are just asserting that a chemical binds a gene/produce. use molecularly interacts with
      • similarly, 'affects transport of' may not be appropriate if these associations assert that a gene product is directly transporting the chemical - in which case we can use 'transports' as the predicate. I spot checked – 5 ‘affects’ and ‘increases’ records, which were all about transporters/channels. But several ‘decreases’ records were not. I think we have to use affects transport as a more general/forgiving predicate, as we cannot be sure that the gene product is doing the transporting in all cases.
  2. We will implement increases and decreases sub-predicates for each affects predicate.

  3. For qualifiers - review whether gene products are always (or nearly always) the subjects of the 'affects molecular modification of' predicate. If these are commonly used to describe modification of small molecules/none gene products, then we may reconsider the use of GO terms as qualifiers (as these will likely be specific for protein modification, e.g. for acetylation (GO:0006473 ! protein acetylation

mbrush commented 6 years ago

Added 48 new predicates in pull#122, as per the mappings in the spreadsheet linked above, to accommodate initial ingest of CTD chemical-gene data by gamma.

@vdancik and @saramsey be notified and review if you wish!

vdancik commented 6 years ago

There are more caveats when using such data:

balhoff commented 6 years ago

@mbrush should there be increases/decreases molecular interaction with?

deepakunni3 commented 4 years ago

This should be part of the recent predicate harmonization discussion.

sierra-moxon commented 3 years ago

@mbrush @vdancik - I am closing this issue as it indicates that there was a big PR pulling in many. I will move @balhoff and @vdancik's comments on the PR (that was merged) into separate tickets if necessary (these comments might be handled already in other tickets from the chemical working group)?