Closed jvwong closed 1 year ago
When it comes to integrating with factoid model, it's important to assign some sort of 'type' to a Famplex entity. To attempt this, will we use the Famplex provided relations.csv
containing:
A first attempt is to follow a simple heuristic: A complex
namedComplex
is an entity that has some other entity that is partof
it. Otherwise, it is a family
.
Example below for "AMPK":
Turning this into an itemized issue, to stage the changes and reduce risk (in particular, changes in factoid):
family
complex
namedComplex
Summary
Goal: Extend grounding support for complexes & families. Why: Second largest class of Biofactoid entity grounding errors. Ubiquitous in the literature (protein gene - 56%; Family/complex - 17.7%). Been tabled for years. How: Re-use FamPlex, a curated resource for disambiguation of (human) complexes and families.
Background
It is common for researchers to refer to complexes and members of a family. These authors may not be concerned or possibly even aware of the precise individual component(s) of a complex or member(s) of a family to which they refer, but rather wish to convey information about a general class of function or structure. The result is that authors name entities using these broader terms, with the implicit assumption that there are individual components/members.
Example: NF-κB
There are five proteins in the mammalian NF-κB family
Various NF-κB complexes
In Biofactoid
Complexes and families represent the second largest class of errors in entity grounding. See https://github.com/PathwayCommons/factoid/discussions/1003#discussioncomment-4268282.
Example
NF-κB-p62-NRF2 survival signaling is associated with high ROR1 expression in chronic lymphocytic leukemia. Sanchez-Lopez et al. Cell Death Differ. 2020 Jul;27(7):2206-2216
Phosphorylated RB Promotes Cancer Immunity by Inhibiting NF-κB Activation and PD-L1 Expression. Mol Cell . 2019 Jan 3;73(1):22-35.e6.
Implementation
FamPlex is a resource that helps improve named entity recognition, grounding, and relationship resolution. The repository provides several comma-separated files that can be used to populate our grounding resource (Table I).
Top entities referenced
Caveats
References