flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.78k stars 659 forks source link

[BUG] Missing configuration and misconfigured port #3236

Open mficek opened 1 year ago

mficek commented 1 year ago

Describe the bug

When submitting a new workflow to a demo cluster with pyflyte run --remote core/flyte_basics/hello_world.py my_wf, it does not work because no configuration is provided. After calling flytectl config init, the config file ~/.flyte/config.yaml is created, but the endpoint port there is not 30080 as in the demo cluster, but 30081. I had to manually change (rewrite) the config file to have endpoint: dns:///localhost:30080. Then, everything works well.

Expected behavior

Following userguide_setup, I'd expect that following all steps from top to bottom would lead to successful --remote execution, which didn't happen.

Additional context to reproduce

  1. create virtual env
  2. git clone https://github.com/flyteorg/flytesnacks
  3. cd flytesnacks/cookbook
  4. pip install -r core/requirements.txt
  5. install flytectl
  6. flytectl demo start
  7. pyflyte run --remote core/flyte_basics/hello_world.py my_wf <-- this fails
  8. flytectl config init
  9. edit ~/.flyte/config.yaml and change edpoint port from 30081 to 30080
  10. pyflyte run --remote core/flyte_basics/hello_world.py my_wf <-- now it works

Steps 8 and 9 are missing in the documentation.

Screenshots

Result of step 8 in Additional context to reproduce:

pyflyte run --remote cookbook/core/flyte_basics/hello_world.py my_wf
{"asctime": "2023-01-13 13:48:16,864", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1673614096.864140274\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3260,\"referenced_errors\":[{\"created\":\"@1673614096.864139984\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":167,\"grpc_status\":14}]}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2023-01-13 13:48:17,064", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1673614097.064544493\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3260,\"referenced_errors\":[{\"created\":\"@1673614097.064543902\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":167,\"grpc_status\":14}]}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
  File "/home/michal/miniconda3/envs/flyte/bin/pyflyte", line 8, in <module>
    sys.exit(main())
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 539, in _run
    remote_entity = remote.register_script(
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/remote/remote.py", line 596, in register_script
    upload_location, md5_bytes = fast_register_single_script(
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 113, in fast_register_single_script
    upload_location = create_upload_location_fn(content_md5=md5)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/michal/miniconda3/envs/flyte/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses"
    debug_error_string = "{"created":"@1673614097.465232045","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1673614097.465231304","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

mficek commented 1 year ago

Related issues: #3129, #2884, #2503

eapolinario commented 1 year ago

@mficek , can you double check which version of flytekit you're running? We recently fixed this in https://github.com/flyteorg/flytekit/pull/1384 (which went out in flytekit 1.3.0).

mficek commented 1 year ago

@eapolinario It's 1.3.0

flytectl demo start
🧑‍🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
delete existing sandbox cluster [y/n]: y
🐋 Going to use Flyte v1.3.0 release with image cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-f69fb09ca189e8bf57e1a6a12db168274f640d15 
github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏