dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
12.01k stars 1.5k forks source link

Compute logs intermittently raise exception during tail_polling on Windows #4383

Open dagsterbot[bot] opened 3 years ago

dagsterbot[bot] commented 3 years ago

Issue from the Dagster Slack

Compute logs intermittently raise exception during tail_polling on Windows

This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1626886529357400?thread_ts=1626886529.357400&cid=C01U954MEER

Conversation excerpt:

U01VAFX0APR: Hi Dagster team! I am getting this error several times. I am trying to launch a pipeline from dagit I got this error:
`Traceback (most recent call last):`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\poll_compute_logs.py", line 61, in <module>`
    `execute_polling(sys.argv[1:])`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\poll_compute_logs.py", line 53, in execute_polling`
    `with open(ipc_output_file, "w"):`
`FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\s4957336\\AppData\\Local\\Temp\\1\\tmpo_zr47w6\\execute-windows-tail-15bc135d998546caa23116bbff00ceb1'`
`2021-06-14 02:33:51 - dagster - ERROR - Mexico_CTMS_Corp - a1cb3fdf-19ef-4676-a6b0-b3c7610ec0f9 - 22036 - PIPELINE_FAILURE - Execution of pipeline "Mexico_CTMS_Corp" failed. An exception was thrown during execution.`

`Exception: Timed out waiting for tail process to start`

`Stack Trace:`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\api.py", line 762, in pipeline_execution_iterator`
    `for event in pipeline_context.executor.execute(pipeline_context, execution_plan):`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\executor\in_process.py", line 38, in execute`
    `yield from iter(`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\api.py", line 841, in __iter__`
    `yield from self.iterator(`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\plan\execute_plan.py", line 72, in inner_plan_execution_iterator`
    `active_execution.verify_complete(pipeline_context, step.key)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\contextlib.py", line 120, in __exit__`
    `next(self.gen)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\storage\compute_log_manager.py", line 56, in watch`
    `yield`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\contextlib.py", line 120, in __exit__`
    `next(self.gen)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\storage\local_compute_log_manager.py", line 51, in _watch_logs`
    `yield`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\contextlib.py", line 120, in __exit__`
    `next(self.gen)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\compute_logs.py", line 31, in mirror_stream_to_file`
    `yield pids`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\contextlib.py", line 120, in __exit__`
    `next(self.gen)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\compute_logs.py", line 75, in tail_to_stream`
    `yield pids`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\contextlib.py", line 120, in __exit__`
    `next(self.gen)`
  `File "C:\Users\s4957336\Anaconda3\envs\borrar\lib\site-packages\dagster\core\execution\compute_logs.py", line 104, in execute_windows_tail`
    `raise Exception("Timed out waiting for tail process to start")`
I am not sure why dagster is looking into this path `FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\s4957336\\AppData\\Local\\Temp\\1\\tmpo_zr47w6\\execute-windows-tail-15bc135d998546caa23116bbff00ceb1'` what could I do? I am using windows 10, 64 bit. Conda enviroment. Dagster 0.12.2 version. I got a timeout several times randomnly. I configured the backend in a postgres Instance in dagster.yml file.
UH3RM70A2: cc <@UM49TQ8EB>
U01VAFX0APR: Thanks <@UH3RM70A2> I am trying to convince the architect to migrate our workflows to Dagster but I can't find the solution to this issues in windows. In linux server this error doesn't appear frequently but Windows is more often.
UM49TQ8EB: This looks like an error from capturing the compute logs from your pipeline execution.  I think we can do a better job of not raising an Exception here to ensure that log capture failures don’t kill pipeline execution.
UM49TQ8EB: I think I can get something in for the next release (tomorrow), but in the meantime, you can try to switch the instance configuration in your `dagster.yaml` to use the `NoopComputeLogManager`.  This will raw cause compute logs (stdout/stderr from your pipeline runs) to get discarded, but will ensure that this particular error won’t interrupt your pipeline execution.

The corresponding `dagster.yaml` config entry for that looks like this:
```compute_logs:
  module: dagster.core.storage.noop_compute_log_manager
  class: NoOpComputeLogManager```
UM49TQ8EB: <@U018K0G2Y85> issue Compute logs intermittently raise exception during tail_polling on Windows

Message from the maintainers:

Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.

RMHogervorst commented 2 years ago

I have something similar. (I run dagster on an raspberry pi 3b) I get the error

Exception while setting up compute log capture
FileNotFoundError: [Errno 2] No such file or directory: 'tail'

There is tail installed, I've used before, so I'm not sure what happens here?

RMHogervorst commented 2 years ago
FileNotFoundError: [Errno 2] No such file or directory: 'tail'

  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/execution/plan/execute_plan.py", line 66, in inner_plan_execution_iterator
    stack.enter_context(
  File "/usr/lib/python3.9/contextlib.py", line 429, in enter_context
    result = _cm_type.__enter__(cm)
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/storage/compute_log_manager.py", line 69, in watch
    with self._watch_logs(pipeline_run, step_key):
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/storage/local_compute_log_manager.py", line 50, in _watch_logs
    with mirror_stream_to_file(sys.stdout, outpath):
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 29, in mirror_stream_to_file
    with tail_to_stream(filepath, stream) as pids:
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 77, in tail_to_stream
    with execute_posix_tail(path, stream) as pids:
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/dagster/dagster_project/venv/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 122, in execute_posix_tail
    tail_process = subprocess.Popen(tail_cmd, stdout=stream)
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
RMHogervorst commented 2 years ago

also on another ubuntu (22.04) distribution

Exception while setting up compute log capture
FileNotFoundError: [Errno 2] No such file or directory: 'tail'

Could it be that the log location is not send through?

FileNotFoundError: [Errno 2] No such file or directory: 'tail'

  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/execution/plan/execute_plan.py", line 64, in inner_plan_execution_iterator
    stack.enter_context(
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 448, in enter_context
    result = _cm_type.__enter__(cm)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/storage/compute_log_manager.py", line 69, in watch
    with self._watch_logs(pipeline_run, step_key):
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/storage/local_compute_log_manager.py", line 51, in _watch_logs
    with mirror_stream_to_file(sys.stdout, outpath):
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 29, in mirror_stream_to_file
    with tail_to_stream(filepath, stream) as pids:
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 77, in tail_to_stream
    with execute_posix_tail(path, stream) as pids:
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/site-packages/dagster/core/execution/compute_logs.py", line 122, in execute_posix_tail
    tail_process = subprocess.Popen(tail_cmd, stdout=stream)
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/riker/.pyenv/versions/3.9.13/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
RMHogervorst commented 2 years ago

I think I know what causes this in my case. I run this process as a systemd service and that has (intentionally) not the entire PATH. And therefore it cannot find 'tail', tail lives in /usr/bin/ so I had to add that to the environmental variables for systemd. (my case is therefore closed)

alexclaydon commented 1 year ago

Thanks. I'm on MacOS, on Apple Silicon, running from a Python 3.9.16 venv and was seeing a similar error. Adding PATH=$PATH:/usr/bin to my .env file, picked up by direnv, fixed it for me.