dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.7k stars 1.48k forks source link

Unexpected GraphQL error, unavailable sensor page #15102

Open TemaDobryyR opened 1 year ago

TemaDobryyR commented 1 year ago

Dagster version

1.3.13

What's the issue?

Some of code locations cant load their sensors with that error:

Operation name: SingleSensorQuery

Message: (psycopg2.errors.QueryCanceled) canceling statement due to statement timeout

[SQL: SELECT job_ticks.id, job_ticks.tick_body 
FROM job_ticks 
WHERE job_ticks.selector_id = %(selector_id_1)s OR job_ticks.selector_id IS NULL AND job_ticks.job_origin_id = %(job_origin_id_1)s ORDER BY job_ticks.timestamp DESC 
 LIMIT %(param_1)s]
[parameters: {'selector_id_1': '748fba06f7c12ef234e280dc39991c4acbb2867a', 'job_origin_id_1': '06700de6ca743de3769686243848671067735d0e', 'param_1': 1}]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Path: ["sensorOrError","sensorState","ticks"]

Locations: [{"line":23,"column":9}]

Stack Trace:
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 521, in execute_field
    result = resolve_fn(source, info, **args)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/schema/instigation.py", line 686, in resolve_ticks
    statuses=statuses,
  File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 649, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/instance/__init__.py", line 2415, in get_ticks
    origin_id, selector_id, before=before, after=after, limit=limit, statuses=statuses
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/schedules/sql_schedule_storage.py", line 380, in get_ticks
    rows = self.execute(query)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/schedules/sql_schedule_storage.py", line 68, in execute
    result_proxy = conn.execute(query)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1415, in execute
    execution_options or NO_OPTIONS,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 484, in _execute_on_connection
    self, distilled_params, execution_options
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1645, in _execute_clauseelement
    cache_hit=cache_hit,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
    dialect, context, statement, parameters
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1985, in _exec_single_context
    e, str_statement, effective_parameters, cursor, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1966, in _exec_single_context
    cursor, str_statement, effective_parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)

The above exception was the direct cause of the following exception:

Message: psycopg2.errors.QueryCanceled: canceling statement due to statement timeout

Stack Trace:
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1966, in _exec_single_context
    cursor, str_statement, effective_parameters, context
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)

But i can still turn on/off sensors from their jobs

What did you expect to happen?

No response

How to reproduce?

No response

Deployment type

Dagster Helm chart

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

stefkauff commented 1 year ago

Is this confirmed? We're seeing the same with schedules and jobs overview in dagster 1.3.14.

If not I'll create a new bug.

Jobs: https://dagster-host/overview/jobs

Object { message: "(psycopg2.errors.QueryCanceled) canceling statement due to statement timeout\n\n
[SQL: SELECT runs.id, runs.run_body, runs.status, runs.create_timestamp, runs.update_timestamp, runs.start_time, runs.end_time \n
FROM runs \nWHERE runs.pipeline_name = %(pipeline_name_1)s AND runs.run_id IN (SELECT run_tags.run_id \n
FROM run_tags \nWHERE run_tags.key = %(key_1)s AND run_tags.value = %(value_1)s) ORDER BY runs.id DESC \n 
LIMIT %(param_1)s]\n[parameters: {'pipeline_name_1': 'pip1', 'key_1': '.dagster/repository', 'value_1': '__repository__@pip1', 'param_1': 5}]\n
(Background on this error at: https://sqlalche.me/e/20/e3q8)", locations: […], path: […], … }

Schedules https://dagster-host/overview/schedules

(psycopg2.errors.QueryCanceled) canceling statement due to statement timeout\n\n
[SQL: SELECT job_ticks.id, job_ticks.tick_body \nFROM job_ticks \n
WHERE job_ticks.selector_id = %(selector_id_1)s OR job_ticks.selector_id IS NULL 
AND job_ticks.job_origin_id = %(job_origin_id_1)s ORDER BY job_ticks.timestamp DESC \n 
LIMIT %(param_1)s]\n[parameters: 
{'selector_id_1': 'fd4687f6971ae96c9094dfdc5e393b89b4c579b4', 'job_origin_id_1':
 '1a9c89b0fef1047f632ee3d72c27bb3ffbfcb6e1', 'param_1': 1}]\n
(Background on this error at: https://sqlalche.me/e/20/e3q8)