Closed newgene closed 2 years ago
tested on our dev instance, with redis enabled (~4G memory total), it fails at the step of setting cache:
Some further detail: the execution time of the query would not appear to be a factor in whether the query hangs the server.
This query took the server 1,008,276ms, but in my testing concurrent queries were not significantly slowed:
Meanwhile, this query took 343,514ms (less overall time), but concurrent queries were significantly slowed:
Further comparison of response time for concurrent queries between these two shows that the response time while handling the first query goes down while the server is transforming API responses -- it's almost definitely something in api-response-transform
causing this. I'm taking a look for anything that might be causing synchronous execution there now.
@newgene @andrewsu @tokebe I believe the queries here are not the type of query that we should expect from Translator, and we may not need to support them. I suggest a discussion over who is sending these queries and what they are trying to do.
They are likely using too many IDs in the query (and the predicate restriction, while reasonable, may not get answers or what you expect since we may not have data connected with those predicates).
Discussions in Translator have been happening around limits on this like the number of IDs that can be in a query under 1 node...
I think the issue highlighted by these queries is still well worth attempting to track down and to fix. It appears that any query which could result in a larger quantities of response transformation can cause server response time to increase.
Another thing worth noting is that I never saw a query cause significant memory usage issues on my local -- I suspect that hard crashes are being caused by high CPU usage and processes becoming unresponsive, though memory usage is rather high for some queries, likely due to excessive object cloning. I haven't looked into the dev or prod instances, however, so I could be entirely off the mark there.
@newgene I think this issue has been adequately addressed/split into more specific performance issues, recommending we close it.
This is the query captured from the log:
When tested locally, this query can return results in
1044855ms
, with many log msgs like below: