Open sstemann opened 3 years ago
ARAX can complete this query, but because it is so huge, it takes 7 minutes. If the ARS doesn't wait that long, then it will appear as an error.
The 3 results from the Genetics KP are: "MONDO:0032827": { "name": "Epilepsy, idiopathic generalized, susceptibility to, 16" }, "MONDO:0032886": { "name": "Liang-Wang syndrome" }, "MONDO:0060551": { "name": "Cerebellar atrophy, developmental delay, and seizures" }
As a note, this query results in something like 45K edges when querying ARAX. The query will eventually finish, and the results can be viewed here: https://arax.ncats.io/?r=11931. The issue is due to promiscuous nodes: not constraining the disease or phenotypic feature nodes means the query goes through nodes like syndromic disease
which connect to a large fraction of phenotypic features. Indeed, the explosion is due to the second edge that is added.
One way to circumvent this is to look for "most representative" diseases and phenotypic features by ranking the first hop according to a Fisher exact test (and taking the top, say, 20 edges according to the Fisher test), and repeating this for the second hop. In ARAX, this completes in under two minutes and returns a bit better of results: https://arax.ncats.io/?r=11934
As an aside, this is another example where user specified operations would be beneficial.
Something weird is happening here. I see the imProving Agent return successfully in less than 1 second in our logs, but the results seem to never make it back to the ARS and the status continues to show "Running" in ARAX. Here are (a subset of) the logs from query with the same ARS message id:
web1.1_1 | 2021-06-08 01:55:51,579 INFO improving_agent.src.core line 69: Got query {'message': {'query_graph': {'nodes': {'n0': {'ids': ['HGNC:6284'], 'categories': ['biolink:Gene']}, 'n1': {'categories': ['biolink:Disease']}, 'n2': {'categories': ['biolink:PhenotypicFeature']}}, 'edges': {'e0': {'subject': 'n0', 'object': 'n1'}, 'e1': {'subject': 'n1', 'object': 'n2'}}}}, 'callback': 'http://localhost:8000/ars/api/messages/2cf3af2a-932e-429b-a64d-9e6c8c931d77'}...
...
web1.1_1 | 2021-06-08 01:55:51,836 INFO improving_agent.src.core line 94: Success. Returning 208 results...
web1.1_1 | [pid: 7|app: 0|req: 914/1238] 172.21.0.1 () {32 vars in 407 bytes} [Tue Jun 8 01:55:51 2021] POST /api/v1.1/query => generated 503475 bytes in 338 msecs (HTTP/1.0 200) 2 headers in 75 bytes (1 switches on core 0)
Here are the results attained through direct querying. Apologies for the txt
extension, Github does not allow JSON attachments.
COHD was down last night while we were doing some work on it, hence the bad gateway response, but it's back up and running now. Running the query directly through COHD returns a message saying
Unsupported query. Only one-hop queries supported.
OpenPredict similarly says
Multi-edges queries not yet implemented`
MolePro does not support two-hop queries.
CAM-KP does not include diseases or phenotypic features.
Query: kcnma1DiseasePhenotypicFeat.json PK: 406da4ca-fef1-499d-9d61-e3e9ca0914c0 HGNC: 6284 Control: Paroxysmal Nonkinesigenic Dyskinesia (drop attacks) and/or Gene-Phenotype Relationships from OMIM Results Tracking Sheet
This is a two-hop query related to Phenotypic Feature - Gene KCNMA1 (HGNC:6284)
In issue #70 we ran first as phenotypicfeature - gene, then we ran as diseaseorphenotypicfeature - gene, in this case we introduce disease as a middle hop.