dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.14k stars 1.4k forks source link

Dagit UserWarning: Error loading repository location, grpc._channel._InactiveRpcError: #4289

Open pollloq opened 3 years ago

pollloq commented 3 years ago

Summary

Hello Dagster team.

I just installed the latest version of dagster and dagit == 0.11.13. I have been running through the quick start "https://docs.dagster.io/getting-started#quick-start", and I came across this error message (in my cmd prompt) about a grpc channel that failed to connect. When running "dagster pipeline execute -f hello_world.py" command, it works fine but running dagit -f hello_world.py provides the grpc error message. Any thoughts about how to solve this issue ?

Reproduction

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1623674795.225000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3009,"referenced_errors":[{"created":"@1623674795.225000000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}" >

gibsondan commented 3 years ago

Hi @pollloq, sorry for the trouble here. A couple of follow-up questions:

Thanks!

pollloq commented 3 years ago

Hi @gibsondan, many thanks for your quick reply.

Stack Trace: File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\cli\workspace\workspace.py", line 179, in _load_location location = self.create_location_from_origin(origin) File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\cli\workspace\workspace.py", line 134, in create_location_from_origin grpc_server_registry=self._grpc_server_registry, File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\host_representation\repository_location.py", line 504, in init self._container_image = self._reload_current_image() File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\host_representation\repository_location.py", line 558, in _reload_current_image return self.client.get_current_image().current_image File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\grpc\client.py", line 366, in get_current_image res = self._query("GetCurrentImage", api_pb2.Empty) File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\grpc\client.py", line 89, in _query response = getattr(stub, method)(request_type(**kwargs), timeout=timeout) File "c:\miniconda3\envs\pipeline\lib\site-packages\grpc_channel.py", line 946, in call return _end_unary_response_blocking(state, call, False, None) File "c:\miniconda3\envs\pipeline\lib\site-packages\grpc_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) location_name=location_name, error_string=error.to_string() c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\execution\compute_logs.py:42: UserWarning: WARNING: Compute log capture is disabled for the current environment. Set the environment variable PYTHONLEGACYWINDOWSSTDIO to enable. warnings.warn(WIN_PY36_COMPUTE_LOG_DISABLED_MSG) Loading repository... Serving on http://127.0.0.1:3000 in process 17344

I am wondering if it is related to the fact that I am working inside a corporate network behind proxies and a vpn connection !!?

gibsondan commented 3 years ago

Is it possible to paste the full output of the dagit command from when it starts running until it throws the error? There may be a clue earlier in the output. It does seem possible that the failure is due to network restrictions though - in order to operate, dagit needs to be able to connect to a gRPC server running in a subprocess on the same machine via localhost.

pollloq commented 3 years ago

Ok, understood. Below the full output of the Dagit command "dagit -f hello_world.py" :

c:\miniconda3\envs\pipeline\lib\site-packages\dagster\cli\workspace\workspace.py:184: UserWarning: Error loading repository location hello_world.py:grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses" debug_error_string = "{"created":"@1623684036.397000000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3009,"referenced_errors":[{"created":"@1623684036.397000000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"

Stack Trace: File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\cli\workspace\workspace.py", line 179, in _load_location location = self.create_location_from_origin(origin) File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\cli\workspace\workspace.py", line 134, in create_location_from_origin grpc_server_registry=self._grpc_server_registry, File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\host_representation\repository_location.py", line 504, in init self._container_image = self._reload_current_image() File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\host_representation\repository_location.py", line 558, in _reload_current_image return self.client.get_current_image().current_image File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\grpc\client.py", line 366, in get_current_image res = self._query("GetCurrentImage", api_pb2.Empty) File "c:\miniconda3\envs\pipeline\lib\site-packages\dagster\grpc\client.py", line 89, in _query response = getattr(stub, method)(request_type(**kwargs), timeout=timeout) File "c:\miniconda3\envs\pipeline\lib\site-packages\grpc_channel.py", line 946, in call return _end_unary_response_blocking(state, call, False, None) File "c:\miniconda3\envs\pipeline\lib\site-packages\grpc_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state)

location_name=location_name, error_string=error.to_string() c:\miniconda3\envs\pipeline\lib\site-packages\dagster\core\execution\compute_logs.py:42: UserWarning: WARNING: Compute log capture is disabled for the current environment. Set the environment variable PYTHONLEGACYWINDOWSSTDIO to enable.

warnings.warn(WIN_PY36_COMPUTE_LOG_DISABLED_MSG) Loading repository... Serving on http://127.0.0.1:3000 in process 17492

pollloq commented 3 years ago

The need for dagit to be able to connect to a gRPC server on the same machine via localhost must be something specific to dagit process I guess. Although no pipeline was triggered by the command "dagit -f hello_world.py" , I can still open the http://127.0.0.1:3000 and access dagit UI which shows no repositories, error status, etc. I can also view the previous runs that I did with the dagster cli command...

gibsondan commented 3 years ago

Yeah, running a pipeline directly via dagster pipeline execute doesn't create a server, so that all makes sense.

pybokeh commented 3 years ago

Hi @pollloq ! I think I had similar problem as you as I had similar stack trace and dagit issues and was discussed on Slack.

TL;DR - problem resolved by setting no_proxy environment variable as company's proxy server was not set up to support gRPC protocol.

set no_proxy=localhost,127.0.0.1,0.0.0.0 not sure if all 3 "local" host mappings needed to be excluded, but anyway, that worked for us at my company. If you've created a separate gRPC server, then you would use its IP address instead.

pollloq commented 3 years ago

Hi @pybokeh ! unfortunately, the comments you have kindly shared are not working for me.

pybokeh commented 3 years ago

Hi @pollloq Yes, I was using Windows machine. I set the no_proxy environment variable using command line. Just in case you weren't aware, if you set the no_proxy environment variable using Windows GUI method instead, you have to reboot your machine for the environment variable to take effect.

I did not set up my own gRPC server, so I did not have to issue special commands.

archydeberker commented 2 years ago

I also saw this error upon triggering a job via API on a newly launched server on Google Cloud Run.

I manually re-ran the job without issue, so I wondered if it was to do with a latency somewhere in the system, and I was attempting to launch the job before the server was truly running?


  grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1637011295.740060951","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3158,"referenced_errors":[{"created":"@1637011295.740059226","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":147,"grpc_status":14}]}"
>

File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 359, in start_run
    res = self._query(
  File "/usr/local/lib/python3.9/site-packages/dagster/grpc/client.py", line 110, in _query
    response = getattr(stub, method)(request_type(**kwargs), timeout=timeout)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
mrdavidlaing commented 2 years ago

FWIW, if your user code takes a long time to load, you might need to bump up the startupProbe.initialDelaySeconds - eg:

dagster-user-deployments:
  deployments:
    - name: "my-large-repository"
       startupProbe:
         enabled: true
         initialDelaySeconds: 30