Open tokebe opened 5 months ago
It's also worth noting that theoretical maximum speed (total time for all queries / 32 subqueries at a time) doesn't give an accurate picture of actual total subquery time. I'll look more into this as well.
Just posting some rankings, the count is the number of times the node was referenced as an edge's subject or object. I've truncated it to just 100+ references:
MONDO:0019609 | MONDO:0005377 | MONDO:0016575 |
---|---|---|
I'll create a more useful tabulation later -- the first of each is obviously just the query ID, but it appears that we have some fairly highly-used nodes, which mostly appear to come from phenotypes in Template-2, which generally makes sense. I'll probably need to compare the info so far to a query which doesn't usually time out, though I suspect the answer will generally come down to "It doesn't hit Template 2".
As it stands, BTE is able to handle many simultaneous queries without any failures or going over time. However, some queries are prone to failure by timeout and may affect assessments of BTE's concurrency capability. These failures seem to be due to subquery time, as a profile of some suspect queries shows a majority of system time spent waiting, with hard computation contributing little to overall execution time. These should be investigated further so we can identify good ways to avoid timeout scenarios.
I have 3 example queries which have a tendency to time out. I've captured them by running with no time limit on my local, mirroring CI's environment for smartapi specs/etc. All example queries so far are from Drug -> Treats -> Disease.
MONDO:0019609
This query occasionally does complete within 5 minutes, but appears to fail frequently, especially if there's any other pressure on the server.
Template 1: 59 Queries, 8 results Template 2: 347 Queries, 4137 results
Overall:
MONDO:0005377
Template 1: 60 queries, 420 results Template 2: 211 queries, 4829 results
Overall:
MONDO:0016575
Template 1: 61 queries, 192 results Template 2: 240 queries, 4460
Overall:
Notes
For each table, I rounded seconds to the nearest millisecond. It's also worth noting that time spent waiting is not representative of overall time contribution to execution, as sub-queries can happen simultaneously. However, a high ratio of time spent waiting to times called heavily implies significant contribution to overall query time.
More analysis remains to be done, however we can see that Monarch/Biolink is the top contributor to overall time. A single query execution in BTE can make 32 simultaneous subqueries, meaning an average minimum time contribution of ~23.7 seconds by Monarch/Biolink, with the other major-looking players being much smaller by comparison (~6.65 from SemmedDB, and ~4.24 from Text Mining Targeted).
It's also worth noting that time taken includes both query time and response transformation time, so it's possible that specific queries take a long time to transform, rather than the API taking a long time to respond, however previous infromation from Sentry makes this appear unlikely.
Next up I'll see if I can identify any promiscuous nodes causing these large numbers of queries.