Open kshefchek opened 7 years ago
Why not just make different associations? Doesn't each have it's own evidence/provenance etc?
@cmungall could you clarify your suggestion? One document per association could lead to a lot of additional documents since we infer across variants; some genes have a lot of causal variants for a disease (eg BRCA). One document per relation is possible, but IMO we'll still be showing too much duplication to the user (or operating on it in ontobio).
As a potential workaround for G2D, I have split up causal vs non causal associations. This way they can be displayed separately to our end users. The downside is that there will be some redundancy between the two gene-disease lists, as CTD and Coriell will often report he causal gene in additional to those with more hypothetical evidence.
I think your solution is on the right lines. I think having a smaller set of relationship types where we separate evidence from relation ("likely pathogenic" should not be a relation) should in theory mean high quality resources should not generally conflict
The relation that maps to ACMG likely_pathogenic is all in yaml file(s), so it's an easy change when we're ready.
Thinking about this from the UI perspective, should we have one list of causal genes, and one list of all genes so that the latter list fully subsumes the list of causal genes (instead of partially overlapping sets)?
I don't have strong opinions about the UI so long as it's clear.
I had envisioned on the disease page showing the causal gene prominently (first entry in table, if we have a table view) and others beneath that
On Tue, Feb 19, 2019 at 2:27 PM Kent Shefchek notifications@github.com wrote:
The relation that maps to ACMG likely_pathogenic is all in yaml file(s), so it's an easy change when we're ready.
Thinking about this from the UI perspective, should we have one list of causal genes, and one list of all genes so that the latter list fully subsumes the list of causal genes (instead of partially overlapping sets)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SciGraph/golr-loader/issues/35#issuecomment-465336755, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOaDPFhXkEyHiH6qKvcSheki6TFJjks5vPHpAgaJpZM4QlCgv .
Adding a little reminder that Chris' suggestion is still not implemented. Instead, we have a list of all genes, and the causal gene in this, our favorite example, shows up 6th on the list.
Consider the following pattern:
(subject:gene)<-[has_locus]-(variant)-[relation]->(object:disease)
Where relation is one of:
In many cases, multiple variants of a single gene are linked to a disease via multiple relations (commonly pathogenic and likely pathogenic). Currently, the solr loader seems to pick a relation at random (although this may not be the case and it may in fact be deterministic for a given db).
This is also an issue with combining orthology statements from multiple sources (panther and zfin) where panther specifies whether two orthologs have a 1 to 1 relationship whereas zfin does not.
One option is to store the set of relations linking two nodes. Another option would be to configure a relation priority, where the relation with the highest priority is designated while the others are retrievable via the evidence graph.
@mbrush @selewis @cmungall thoughts?