Closed parkerabercrombie closed 8 months ago
Additional data point: the sequencing server is now returning a 504 to Hasura for expandAllActivities
(the /command-expansion/expand-all-activity-instances
route on the sequencing server). There's no confirmation that expansion has run successfully in the UI, although we're able to retrieve a sequence with one activity after 4.5 minutes.
I see this in the hasura logs from this morning. Not sure if related to the 504 Carter reported or not.
08:36:02
{
"detail": {
"http_info": {
"content_encoding": null,
"http_version": "HTTP/1.1",
"ip": "100.67.28.190",
"method": "POST",
"status": 200,
"url": "/v1/graphql"
},
"operation": {
"error": {
"code": "unexpected",
"error": "database query error",
"internal": {
"arguments": [
"(Oid 114,Just (\"{\\\"x-hasura-role\\\":\\\"admin\\\"}\",Binary))",
"(Oid 23,Just (\"\\NUL\\NULI&\",Binary))"
],
"error": {
"description": null,
"exec_status": "FatalError",
"hint": "You might need to increase max_locks_per_transaction.",
"message": "out of shared memory",
"status_code": "53200"
},
"prepared": true,
"statement": "SELECT coalesce(json_agg(\"root\" ), '[]' ) AS \"root\" FROM (SELECT row_to_json((SELECT \"_e\" FROM (SELECT \"_root.base\".\"type\" AS \"type\", \"_root.base\".\"duration\" AS \"duration\", \"_root.ar.root.profile_segments\".\"profile_segments\" AS \"profile_segments\", \"_root.base\".\"name\" AS \"name\" ) AS \"_e\" ) ) AS \"root\" FROM (SELECT * FROM \"public\".\"profile\" WHERE ((\"public\".\"profile\".\"dataset_id\") = (($2)::integer)) ) AS \"_root.base\" LEFT OUTER JOIN LATERAL (SELECT coalesce(json_agg(\"profile_segments\" ), '[]' ) AS \"profile_segments\" FROM (SELECT row_to_json((SELECT \"_e\" FROM (SELECT \"_root.ar.root.profile_segments.base\".\"start_offset\" AS \"start_offset\", \"_root.ar.root.profile_segments.base\".\"dynamics\" AS \"dynamics\", \"_root.ar.root.profile_segments.base\".\"is_gap\" AS \"is_gap\" ) AS \"_e\" ) ) AS \"profile_segments\" FROM (SELECT * FROM \"public\".\"profile_segment\" WHERE (((\"_root.base\".\"id\") = (\"profile_id\")) AND ((\"_root.base\".\"dataset_id\") = (\"dataset_id\"))) ) AS \"_root.ar.root.profile_segments.base\" ) AS \"_root.ar.root.profile_segments\" ) AS \"_root.ar.root.profile_segments\" ON ('true') ) AS \"_root\" "
},
"path": "$"
},
"query": {
"query": "query{\n profile(where: {dataset_id: {_eq: 18726}}){\n type\n duration\n profile_segments {\n start_offset\n dynamics\n is_gap\n }\n name\n }\n}\n"
},
"request_id": "d42042e2-9d1c-4af8-8c00-a0eb3906296f",
"request_mode": "error",
"response_size": 1666,
"user_vars": {
"x-hasura-role": "admin"
}
},
"request_id": "d42042e2-9d1c-4af8-8c00-a0eb3906296f"
},
"level": "error",
"timestamp": "2024-03-15T15:36:02.387+0000",
"type": "http-log"
}
Thanks for the report - @goetzrrGit and @Mythicaeda are looking into this to see if we can prevent the 504 error and/or improve performance.
Here's an update on our progress today... @goetzrrGit and @Mythicaeda have been working on this for most of the day today and were able to replicate the slow query mentioned at the top of the ticket (GetExpansionSequenceSeqJson
):
The comment from @cartermak above (“sequencing server returning a 504 to Hasura for expandAllActivities … although we’re able to retrieve a sequence with one activity after 4.5 minutes.“) is a separate issue which is unrelated to the originally reported slow query
Thank you for the work and the update @dandelany ! @parkerabercrombie , let me know whether/how I can support.
Thanks @dandelany ! We can try changing the configuration on that server on Monday.
Checked for duplicates
No - I haven't checked
Is this a regression?
No - This is a new bug
Version
2.4.0
Describe the bug
Observed on our test venue. Opening a sequence in the UI (running command expansion and then clicking "open" on the sequence) takes a very long time to show the resulting sequence. The network call to fetch the sequence takes ~1 min to return.
This is the query that is slow. Running this query directly in the hasura console was also slow.
Reproduction
We have only observed this on our dev cloud venue. We were not able to reproduce it locally. It maybe related to the amount of data on the venue.
Logs
No response
System Info
Severity
Major