NCATSTranslator / testing

Materials and tools for testing Translator components
1 stars 9 forks source link

Gene KCNMA1 (HGNC:6284) - Disease - Phenotypic Feature #72

Open sstemann opened 3 years ago

sstemann commented 3 years ago

Query: kcnma1DiseasePhenotypicFeat.json PK: 406da4ca-fef1-499d-9d61-e3e9ca0914c0 HGNC: 6284 Control: Paroxysmal Nonkinesigenic Dyskinesia (drop attacks) and/or Gene-Phenotype Relationships from OMIM Results Tracking Sheet

This is a two-hop query related to Phenotypic Feature - Gene KCNMA1 (HGNC:6284)

In issue #70 we ran first as phenotypicfeature - gene, then we ran as diseaseorphenotypicfeature - gene, in this case we introduce disease as a middle hop.

image

edeutsch commented 3 years ago

ARAX can complete this query, but because it is so huge, it takes 7 minutes. If the ARS doesn't wait that long, then it will appear as an error.

marcdubybroad commented 3 years ago

The 3 results from the Genetics KP are: "MONDO:0032827": { "name": "Epilepsy, idiopathic generalized, susceptibility to, 16" }, "MONDO:0032886": { "name": "Liang-Wang syndrome" }, "MONDO:0060551": { "name": "Cerebellar atrophy, developmental delay, and seizures" }

dkoslicki commented 3 years ago

As a note, this query results in something like 45K edges when querying ARAX. The query will eventually finish, and the results can be viewed here: https://arax.ncats.io/?r=11931. The issue is due to promiscuous nodes: not constraining the disease or phenotypic feature nodes means the query goes through nodes like syndromic disease which connect to a large fraction of phenotypic features. Indeed, the explosion is due to the second edge that is added.

One way to circumvent this is to look for "most representative" diseases and phenotypic features by ranking the first hop according to a Fisher exact test (and taking the top, say, 20 edges according to the Fisher test), and repeating this for the second hop. In ARAX, this completes in under two minutes and returns a bit better of results: https://arax.ncats.io/?r=11934

As an aside, this is another example where user specified operations would be beneficial.

brettasmi commented 3 years ago

Something weird is happening here. I see the imProving Agent return successfully in less than 1 second in our logs, but the results seem to never make it back to the ARS and the status continues to show "Running" in ARAX. Here are (a subset of) the logs from query with the same ARS message id:

web1.1_1  | 2021-06-08 01:55:51,579 INFO improving_agent.src.core line 69: Got query {'message': {'query_graph': {'nodes': {'n0': {'ids': ['HGNC:6284'], 'categories': ['biolink:Gene']}, 'n1': {'categories': ['biolink:Disease']}, 'n2': {'categories': ['biolink:PhenotypicFeature']}}, 'edges': {'e0': {'subject': 'n0', 'object': 'n1'}, 'e1': {'subject': 'n1', 'object': 'n2'}}}}, 'callback': 'http://localhost:8000/ars/api/messages/2cf3af2a-932e-429b-a64d-9e6c8c931d77'}...
...
web1.1_1  | 2021-06-08 01:55:51,836 INFO improving_agent.src.core line 94: Success. Returning 208 results...
web1.1_1  | [pid: 7|app: 0|req: 914/1238] 172.21.0.1 () {32 vars in 407 bytes} [Tue Jun  8 01:55:51 2021] POST /api/v1.1/query => generated 503475 bytes in 338 msecs (HTTP/1.0 200) 2 headers in 75 bytes (1 switches on core 0)

Here are the results attained through direct querying. Apologies for the txt extension, Github does not allow JSON attachments.

2021-06-07_kcnma1.json.txt

CaseyTa commented 3 years ago

COHD was down last night while we were doing some work on it, hence the bad gateway response, but it's back up and running now. Running the query directly through COHD returns a message saying

Unsupported query. Only one-hop queries supported.

OpenPredict similarly says

Multi-edges queries not yet implemented`

vdancik commented 3 years ago

MolePro does not support two-hop queries.

balhoff commented 3 years ago

CAM-KP does not include diseases or phenotypic features.