Closed andrewsu closed 2 years ago
Hi Andrew, and thanks for moving this question this over from our Slack chat - as I think it speaks to a common question about what relationships to create as Associations vs Slots, and how these are related.
First, I do think that the types of relationships you describe would be in scope for the Biolink Model (BLM). W.r.t. where/how to created them in the BLM, a distinction to keep in mind is that between an Association (aka an edge) and a Slot (aka a property).
Associations in the BLM are relationship reifications that provide schema for representing the semantics of what is stated as true in the association, and also the provenance/evidence supporting it. Associations are instantiated in the data (e.g. an edge in a KG) to capture an assertion of domain knowledge, and the E/P supporting it. The schema for a GenetoDiseaseAssociation is here.
Slots/properties, by contrast, are simply named predicates used to capture properties of an Entity or an Association. They are used to capture the semantics of the relationship asserted in an Association instance.
I think for your example, we would want to create new Biolink slots for each gene-disease relationship we think might be useful to describe in a GenetoDiseaseAssociation instance. These slots would descend from the root BLM slot/property 'related to'. And more specifically, they might be direct children of the 'gene associated with condition' slot that itself is a child of 'related to'. Also, these BLM slots could be mapped where possible to properties in the Relation Ontology that may already cover some of the relationships you have in mind (e.g. here).
Then, in the context of a GenetoDiseaseAssociation instance, one of these slots/properties (e.g. 'gene_mutated_in_disease') would be used as the value of the Association's 'relation' slot - to indicate that this is the specific type of gene-disease relationship being asserted. For example:
ex:associationinstance001
- association_type: biolink:GeneToDiseaseAssociation
- subject: ex:gene001
- relation: 'gene_mutated_in_disease'
- object: ex:disease001
When useful, a more generic Association type such as GenetoDiseaseAssociation can be specialized to provide a schema for representing a more specific type of gene-disease relationship (e.g. https://biolink.github.io/biolink-model/docs/GeneHasVariantThatContributesToDiseaseAssociation.html). But in general I think the granularity that BLM advocates for Association type specialization is not as fine as that for slots. The idea being that Associations define general categories of associations (e.g. GenetoDiseaseAssociation), and the 'relation' slot of such associations can take one of possibly many slots/properties that describe a more specific relationship asserted in the association instance (e.g. gene_mutations_cuasal_for, gene_overexpressed_in_disease, gene_mutations_increase_susceptability_to, etc.)
Hi Matt, thanks, very helpful! I think I understand more, but will cogitate more and probably follow up with more questions.
But before that, I have very practical suggestion for your consideration. I think for many people (including some in Translator), the somewhat theoretical explanation above might still pose a mental roadblock to adopting BLM. I think if some people can't find the right relationship type to use in their graph right off the bat, then they're going to put BLM on the back burner. So my suggestion would be for the BLM team to more proactively create the most common relationships that are likely to be used (in Translator, for example).
As one possible model for how to do this, the table below shows all 49 "meta-edges" from SemMedDB that are used at least 100k times. What about prospectively evaluating these for inclusion in BLM? (Seems like it might be a good "Segment 1" activity for the Standards team for the current FOA?) Or do many/most of the meta-edges below already exist and I'm just missing them?
start_label | type | end_label | count |
---|---|---|---|
Anatomy | LOCATION_OF | Chemicals & Drugs | 9E+05 |
Anatomy | LOCATION_OF | Disorders | 5E+05 |
Anatomy | LOCATION_OF | Genes & Molecular Sequences | 4E+05 |
Anatomy | PART_OF | Anatomy | 2E+05 |
Anatomy | PART_OF | Living Beings | 2E+05 |
Anatomy | PRODUCES | Chemicals & Drugs | 1E+05 |
Chemicals & Drugs | AFFECTS | Anatomy | 2E+05 |
Chemicals & Drugs | AFFECTS | Disorders | 2E+05 |
Chemicals & Drugs | AFFECTS | Physiology | 4E+05 |
Chemicals & Drugs | ASSOCIATED_WITH | Disorders | 4E+05 |
Chemicals & Drugs | AUGMENTS | Anatomy | 2E+05 |
Chemicals & Drugs | AUGMENTS | Disorders | 1E+05 |
Chemicals & Drugs | AUGMENTS | Physiology | 2E+05 |
Chemicals & Drugs | CAUSES | Disorders | 3E+05 |
Chemicals & Drugs | COEXISTS_WITH | Chemicals & Drugs | 6E+05 |
Chemicals & Drugs | COEXISTS_WITH | Genes & Molecular Sequences | 3E+05 |
Chemicals & Drugs | DISRUPTS | Physiology | 2E+05 |
Chemicals & Drugs | INHIBITS | Chemicals & Drugs | 5E+05 |
Chemicals & Drugs | INHIBITS | Genes & Molecular Sequences | 2E+05 |
Chemicals & Drugs | INTERACTS_WITH | Chemicals & Drugs | 1E+06 |
Chemicals & Drugs | ISA | Chemicals & Drugs | 1E+05 |
Chemicals & Drugs | PART_OF | Anatomy | 3E+05 |
Chemicals & Drugs | PART_OF | Chemicals & Drugs | 1E+05 |
Chemicals & Drugs | PART_OF | Genes & Molecular Sequences | 1E+05 |
Chemicals & Drugs | PART_OF | Living Beings | 2E+05 |
Chemicals & Drugs | PREVENTS | Disorders | 1E+05 |
Chemicals & Drugs | STIMULATES | Chemicals & Drugs | 5E+05 |
Chemicals & Drugs | TREATS | Disorders | 5E+05 |
Chemicals & Drugs | TREATS | Living Beings | 1E+05 |
Chemicals & Drugs | compared_with | Chemicals & Drugs | 2E+05 |
Disorders | CAUSES | Disorders | 2E+05 |
Disorders | COEXISTS_WITH | Disorders | 6E+05 |
Disorders | PROCESS_OF | Living Beings | 6E+05 |
Genes & Molecular Sequences | AFFECTS | Physiology | 2E+05 |
Genes & Molecular Sequences | ASSOCIATED_WITH | Disorders | 3E+05 |
Genes & Molecular Sequences | COEXISTS_WITH | Genes & Molecular Sequences | 1E+05 |
Genes & Molecular Sequences | INTERACTS_WITH | Chemicals & Drugs | 4E+05 |
Genes & Molecular Sequences | INTERACTS_WITH | Genes & Molecular Sequences | 2E+05 |
Genes & Molecular Sequences | PART_OF | Anatomy | 2E+05 |
Genes & Molecular Sequences | PART_OF | Living Beings | 1E+05 |
Genes & Molecular Sequences | STIMULATES | Chemicals & Drugs | 3E+05 |
Genes & Molecular Sequences | STIMULATES | Genes & Molecular Sequences | 1E+05 |
Living Beings | LOCATION_OF | Chemicals & Drugs | 4E+05 |
Living Beings | LOCATION_OF | Genes & Molecular Sequences | 2E+05 |
Physiology | PROCESS_OF | Living Beings | 1E+05 |
Procedures | DIAGNOSES | Disorders | 2E+05 |
Procedures | METHOD_OF | Procedures | 2E+05 |
Procedures | TREATS | Disorders | 4E+05 |
Procedures | USES | Chemicals & Drugs | 3E+05 |
Hi Andrew. I agree that in general we should come up with some documentation aimed at helping specific users (e.g. Translator KG developers) navigate the BLM to find the best slots/properties to use in their KGs (and match these slots with the most appropriate association type that provides a schema/context in which to use this slot to capture domain knowledge). I think there are features of the BLM that may help address some of your concerns, but they not may not be well advertised or presented in documentation in the most accessible way.
For example, the BLM does provide an in_subset
slot that can be used with the value "translator_minimal" to tag slots for use by Translator e.g. here. There are over 100 slots currently tagged in this way in the yaml file - but I don’t think there is a great interface for BLM users to see this.
As for the SemMedDB predicate set you provide, a quick check of BLM mappings revealed that nearly all of these (the exceptions being AUGMENTS
, DIAGNOSES
, METHOD_OF
, COMPARED_WITH
) are currently mapped to an existing BLM slot (e.g. for the mapping for INTERACTS_WITH
is here). At present we use a fairly generic mappings
slot to capture these, but we are working toward provide a larger set of mappings slots with more precise semantics (e.g. exact, narrower, broader matches)
Finally, I have yet to look at the different activities proposed in the Translator FOA, but agree that these types of enhancements to the BLM and its documentation should be pursued. Accounts of challenges and feedback that you and others provide will be very useful to inform these efforts. Thanks!
Ahh right, thanks for the reminder that those mappings exist under related to
. And that also clarifies in my mind your previous comment/suggestion about gene associated with condition
. and it also reminds me of this other related ticket https://github.com/biolink/biolink-model/issues/251 which exactly proposes improving how the BLM subsets can be viewed.
I think I also got a bit distracted by steering this down the semmeddb path (which you point out is already reasonably well covered). My original motivation for this ticket was in the context of this nascent database of drug mechanisms (https://zenodo.org/record/3515487#.XbcvNehKguV and https://github.com/SuLab/indication_moa_db/tree/1.0, created by @mmayers12). I've pasted in the table of the most common meta-edges in that resource. At some point in the next few months, can we get your help figuring out which of these relations already exists in BLM and which need to be created? We will be expanding this resource over time, and would like to make sure we're aligned with BLM before we go too much further...
metaedge | count |
---|---|
Drug - INHIBITS - Protein | 67 |
Protein - INVOLVED_IN - Biological Process | 38 |
Biological Process - CAUSES - Disease | 29 |
Taxon - CAUSES - Disease | 26 |
Drug - ACTIVATES - Protein | 24 |
Biological Process - REQUIRED_FOR - Taxon | 24 |
Protein - UP_REGULATES - Biological Process | 14 |
Biological Process - ELEVATED_IN - Disease | 11 |
Biological Process - DISRUPTED_IN - Disease | 11 |
Protein - PRODUCES - Compound Class | 9 |
Protein - DOWN_REGULATES - Protein | 9 |
Compound - PART_OF - Biological Process | 8 |
Biological Process - ASSOCIATED_WITH - Disease | 7 |
Protein - UP_REGULATES - Protein | 7 |
Protein - DOWN_REGULATES - Biological Process | 7 |
Drug - INCREASES - Compound | 6 |
Protein - PRODUCES - Compound | 5 |
Biological Process - REDUCES - Disease | 5 |
not sure if this is related to work the EPC or Predicates WGs are doing
@andrewsu - from your last post, I think most/all of these predicates are in the model (especially with PR #844). Please reopen if I mischaracterized the doneness of this. I think we can definitely add predicates to the predicate hierarchy to help folks find things they need in the model.
This is likely a very naive question... I can imagine a few ways that genes could be related to diseases. For example:
Gene
mutated_in
Disease
,Gene
overexpressed_in
Disease
,Gene
underexpressed_in
Disease
. Would those types of relationships be in scope of the Biolink model? If so, would they theoretically be added as children of https://biolink.github.io/biolink-model/docs/GeneToDiseaseAssociation.html or somewhere else?