dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.14k stars 1.4k forks source link

"WebSocket connection failed" errors with dagit in python 3.11. 3.10 is ok #13890

Open chriscomeau79 opened 1 year ago

chriscomeau79 commented 1 year ago

Dagster version

1.3.1

What's the issue?

When running dagit in python 3.11, I noticed the progress updates in the UI were choppy (every 5-10 seconds). Checked the Chrome dev console and I see messages like this every few seconds:

WebSocket connection to 'ws://127.0.0.1:3000/graphql' failed: 
e.connect @ client.ts:557

Client.ts:173 WebSocket connection to 'ws://127.0.0.1:3000/graphql' failed: WebSocket is closed before the connection is established.
e.close @ client.ts:173

This still happens even if the "disable WebSockets" toggle is enabled.

I tried a few older dagit+dagster versions and the behavior is the same, as long it it's on python 3.11 1.3.1 1.0.0 0.15.9 0.14.20

I switched Dagit to run in a Python 3.10 env and now the run/op progress is updating smoothly.

What did you expect to happen?

No response

How to reproduce?

pip install dagster and dagit in a python 3.11 environment, then run 'dagster dev'

Deployment type

Local

Deployment details

Running inside a conda env with dagster and dagit installed with pip. Same when running 'dagster dev' or 'dagit' locally on Windows 10/11, or running as a service on Ubuntu 22.04

Additional information

Working around this by running dagit and dagster-daemon in a python 3.10 env, then the code locations are in separate python 3.11 envs

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

tacastillo commented 1 year ago

Hi! Thanks for reporting this. Let me take it back to the team to investigate.

clayheaton commented 11 months ago

I'm still seeing this problem in 1.4.16

EtienneT commented 9 months ago

The problem is still here in 1.5.11 with python 3.11. It seems to make the assets page pretty slow to return data about the assets.

image

Any workaround? Even disabling websocket from the user settings doesn't stop those errors: image

metinsenturk commented 7 months ago

i am having the same issue, in my case, Dagit looks like its working totally fine, and i see no visible errors in daemon processes or in the dagit code locations, daemon health pages, etc. I am able to browse the dashboard, see runs, run details, etc.

The last log i have from the dagster webserver process is below.

2024-02-08 03:57:22 -0500 - dagster-webserver - INFO - Serving dagster-webserver on http://0.0.0.0:3000 in process 26048
...
2024-02-08 10:17:14 -0500 - dagster.code_server - INFO - Shutting down Dagster code server for package etl2.repos.prod on port 64784 in process 30904
WARNING:  Invalid HTTP request received.

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\program files\python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\program files\python311\Lib\asyncio\proactor_events.py", line 165, in 
_call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote 
host

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\program files\python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\program files\python311\Lib\asyncio\proactor_events.py", line 165, in 
_call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote 
host

Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\program files\python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\program files\python311\Lib\asyncio\proactor_events.py", line 165, in 
_call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote 
host

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

WARNING:  Invalid HTTP request received.

The following error message occurs when I try launching a job from the Dagit UI.

dagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNAVAILABLE

  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\utils.py", line 125, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\utils.py", line 56, in _fn
    result = fn(self, graphene_info, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\schema\roots\mutation.py", line 301, in mutate
    return create_execution_params_and_launch_pipeline_exec(graphene_info, executionParams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\schema\roots\mutation.py", line 279, in create_execution_params_and_launch_pipeline_exec
    return launch_pipeline_execution(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\execution\launch_execution.py", line 33, in launch_pipeline_execution
    return _launch_pipeline_execution(graphene_info, execution_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\execution\launch_execution.py", line 67, in _launch_pipeline_execution
    run = do_launch(graphene_info, execution_params, is_reexecuted)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\execution\launch_execution.py", line 50, in do_launch
    dagster_run = create_valid_pipeline_run(graphene_info, external_job, execution_params)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\execution\run_lifecycle.py", line 70, in create_valid_pipeline_run
    external_execution_plan = get_external_execution_plan_or_raise(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster_graphql\implementation\external.py", line 100, in get_external_execution_plan_or_raise
    return graphene_info.context.get_external_execution_plan(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_core\workspace\context.py", line 244, in get_external_execution_plan
    ).get_external_execution_plan(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_core\host_representation\code_location.py", line 748, in get_external_execution_plan
    execution_plan_snapshot_or_error = sync_get_external_execution_plan_grpc(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_api\snapshot_execution_plan.py", line 52, in sync_get_external_execution_plan_grpc
    api_client.execution_plan_snapshot(
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_grpc\client.py", line 231, in execution_plan_snapshot
    res = self._query(
          ^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_grpc\client.py", line 167, in _query
    self._raise_grpc_exception(
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_grpc\client.py", line 150, in _raise_grpc_exception
    raise DagsterUserCodeUnreachableError(

The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:62745: socket is null"
    debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:62745: socket is null {created_time:"2024-02-08T19:02:15.431885+00:00", grpc_status:14}"
>

  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_grpc\client.py", line 165, in _query
    return self._get_response(method, request=request_type(**kwargs), timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\dagster\_grpc\client.py", line 140, in _get_response
    return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\grpc\_channel.py", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Environments\Dagster\Lib\site-packages\grpc\_channel.py", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Python version: 3.11.3 Dagster version: 1.5.6

zero-stroke commented 1 month ago

Anyone figure out a fix for this?