bmeg / ophion

Language for making graph queries from data
4 stars 3 forks source link

Slow groupCounts #12

Closed bwalsh closed 7 years ago

bwalsh commented 7 years ago
{"query":[{"has":{"key":"gid","value":{"s":"type:Individual"}}},{"out":{"labels":["hasInstance"]}},{"groupCount":{"key":"info:gender"}}]} took  7964 ms

{"query":[{"has":{"key":"gid","value":{"s":"type:Individual"}}},{"out":{"labels":["hasInstance"]}},{"groupCount":{"key":"info:tumor_status"}}]} took  15424 ms
prismofeverything commented 7 years ago

Yes, this is vexing. Something about groupCount in Janus in general is slow. I have spent too much time on this issue for now, but I think a solution within Janus is possible. We can use elastic search for this in the immediate term, but I don't think that is a final solution.

I built a test using the mongo aggregation pipeline to do this, and using the exact same data these queries (for gender and tumor_status) returned in ~200 ms. Woefully, the query for which samples have a given mutation took about 150000. I think it could work, but the information you would need to filter out all the edges is not available until you actually get all the way to the Gene vertex. Pulling the gene symbol back towards the sample, maybe all the way to variantInBiosample, could fix that. Does more widely distributing the data like this throughout the graph compromise the "purity" of the graph? Ideally we could query things like group counts on properties multiple hops through millions of edges away. In practice, we may have to compromise there.

The other option, besides Janus or Mongo aggregation pipelines is to use Kyle's Arachne: https://github.com/bmeg/arachne. I tried installing it to load in our data and test the same queries, but ran into this issue: https://github.com/bmeg/arachne/issues/2 @kellrott

Postponing until I can devote the time I really need to this one.

bwalsh commented 7 years ago

@prismofeverything @kellrott Hey guys. Can we take a few minutes tomorrow to review https://github.com/bmeg/bmeg-proxy/pull/34

kellrott commented 7 years ago

Issue redefined in https://github.com/bmeg/bmeg-proxy/issues/36