NASA-AMMOS / aerie

A software framework for modeling spacecraft.
https://nasa-ammos.github.io/aerie-docs/
MIT License
73 stars 19 forks source link

Very slow to load sequences in UI #1360

Closed parkerabercrombie closed 8 months ago

parkerabercrombie commented 8 months ago

Checked for duplicates

No - I haven't checked

Is this a regression?

No - This is a new bug

Version

2.4.0

Describe the bug

Observed on our test venue. Opening a sequence in the UI (running command expansion and then clicking "open" on the sequence) takes a very long time to show the resulting sequence. The network call to fetch the sequence takes ~1 min to return.

This is the query that is slow. Running this query directly in the hasura console was also slow.

#graphql
    query GetExpansionSequenceSeqJson($seqId: String!, $simulationDatasetId: Int!) {
      getSequenceSeqJson(seqId: $seqId, simulationDatasetId: $simulationDatasetId) {
        errors {
          location {
            column
            line
          }
          message
          stack
        }
        seqJson
        status
      }
    }

Reproduction

We have only observed this on our dev cloud venue. We were not able to reproduce it locally. It maybe related to the amount of data on the venue.

Logs

No response

System Info

Chrome, hosted on AWS

Severity

Major

cartermak commented 8 months ago

Additional data point: the sequencing server is now returning a 504 to Hasura for expandAllActivities (the /command-expansion/expand-all-activity-instances route on the sequencing server). There's no confirmation that expansion has run successfully in the UI, although we're able to retrieve a sequence with one activity after 4.5 minutes.

parkerabercrombie commented 8 months ago

I see this in the hasura logs from this morning. Not sure if related to the 504 Carter reported or not.

 08:36:02   
{
  "detail": {
    "http_info": {
      "content_encoding": null,
      "http_version": "HTTP/1.1",
      "ip": "100.67.28.190",
      "method": "POST",
      "status": 200,
      "url": "/v1/graphql"
    },
    "operation": {
      "error": {
        "code": "unexpected",
        "error": "database query error",
        "internal": {
          "arguments": [
            "(Oid 114,Just (\"{\\\"x-hasura-role\\\":\\\"admin\\\"}\",Binary))",
            "(Oid 23,Just (\"\\NUL\\NULI&\",Binary))"
          ],
          "error": {
            "description": null,
            "exec_status": "FatalError",
            "hint": "You might need to increase max_locks_per_transaction.",
            "message": "out of shared memory",
            "status_code": "53200"
          },
          "prepared": true,
          "statement": "SELECT  coalesce(json_agg(\"root\" ), '[]' ) AS \"root\" FROM  (SELECT  row_to_json((SELECT  \"_e\"  FROM  (SELECT  \"_root.base\".\"type\" AS \"type\", \"_root.base\".\"duration\" AS \"duration\", \"_root.ar.root.profile_segments\".\"profile_segments\" AS \"profile_segments\", \"_root.base\".\"name\" AS \"name\"       ) AS \"_e\"      ) ) AS \"root\" FROM  (SELECT  *  FROM \"public\".\"profile\"  WHERE ((\"public\".\"profile\".\"dataset_id\") = (($2)::integer))     ) AS \"_root.base\" LEFT OUTER JOIN LATERAL (SELECT  coalesce(json_agg(\"profile_segments\" ), '[]' ) AS \"profile_segments\" FROM  (SELECT  row_to_json((SELECT  \"_e\"  FROM  (SELECT  \"_root.ar.root.profile_segments.base\".\"start_offset\" AS \"start_offset\", \"_root.ar.root.profile_segments.base\".\"dynamics\" AS \"dynamics\", \"_root.ar.root.profile_segments.base\".\"is_gap\" AS \"is_gap\"       ) AS \"_e\"      ) ) AS \"profile_segments\" FROM  (SELECT  *  FROM \"public\".\"profile_segment\"  WHERE (((\"_root.base\".\"id\") = (\"profile_id\")) AND ((\"_root.base\".\"dataset_id\") = (\"dataset_id\")))     ) AS \"_root.ar.root.profile_segments.base\"      ) AS \"_root.ar.root.profile_segments\"      ) AS \"_root.ar.root.profile_segments\" ON ('true')      ) AS \"_root\"      "
        },
        "path": "$"
      },
      "query": {
        "query": "query{\n  profile(where: {dataset_id: {_eq: 18726}}){\n    type\n    duration\n    profile_segments {\n      start_offset\n      dynamics\n      is_gap\n    }\n    name\n  }\n}\n"
      },
      "request_id": "d42042e2-9d1c-4af8-8c00-a0eb3906296f",
      "request_mode": "error",
      "response_size": 1666,
      "user_vars": {
        "x-hasura-role": "admin"
      }
    },
    "request_id": "d42042e2-9d1c-4af8-8c00-a0eb3906296f"
  },
  "level": "error",
  "timestamp": "2024-03-15T15:36:02.387+0000",
  "type": "http-log"
}
dandelany commented 8 months ago

Thanks for the report - @goetzrrGit and @Mythicaeda are looking into this to see if we can prevent the 504 error and/or improve performance.

dandelany commented 8 months ago

Here's an update on our progress today... @goetzrrGit and @Mythicaeda have been working on this for most of the day today and were able to replicate the slow query mentioned at the top of the ticket (GetExpansionSequenceSeqJson):

The comment from @cartermak above (“sequencing server returning a 504 to Hasura for expandAllActivities … although we’re able to retrieve a sequence with one activity after 4.5 minutes.“) is a separate issue which is unrelated to the originally reported slow query

cartermak commented 8 months ago

Thank you for the work and the update @dandelany ! @parkerabercrombie , let me know whether/how I can support.

parkerabercrombie commented 8 months ago

Thanks @dandelany ! We can try changing the configuration on that server on Monday.