apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.04k stars 14.28k forks source link

ipdb no longer works in task execution thread #31784

Closed wes-devore-mpulse closed 1 year ago

wes-devore-mpulse commented 1 year ago

Apache Airflow version

2.6.1

What happened

We want to use "import ipdb; ipdb.set_trace()" in the task execution thread (the python callable)

This just broke when upgrading from 2.4.0 to 2.6.1.

Now we get the following stack trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/application/application.py", line 955, in run
    loop = asyncio.get_event_loop()
  File "/usr/local/lib/python3.10/asyncio/events.py", line 656, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'ThreadPoolExecutor-0_0'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/usr/local/lib/python3.10/site-packages/airflow/cli/cli_config.py", line 51, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/airflow/utils/cli.py", line 112, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 614, in task_test
    ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
  File "/usr/local/lib/python3.10/site-packages/airflow/utils/session.py", line 76, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 1721, in run
    self._run_raw_task(
  File "/usr/local/lib/python3.10/site-packages/airflow/utils/session.py", line 73, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 1407, in _run_raw_task
    self._execute_task_with_callbacks(context, test_mode)
  File "/usr/local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 1558, in _execute_task_with_callbacks
    result = self._execute_task(context, task_orig)
  File "/usr/local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 1623, in _execute_task
    result = execute_callable(context=context)
  File "/usr/local/lib/python3.10/site-packages/airflow/operators/python.py", line 181, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.10/site-packages/airflow/operators/python.py", line 198, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/factory/legacy_tasks.py", line 32, in get_files
    local_files_path, files = report_object.get_files()
  File "/usr/local/airflow/dags/factory/legacy_tasks.py", line 32, in get_files
    local_files_path, files = report_object.get_files()
  File "/usr/local/lib/python3.10/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
  File "/usr/local/lib/python3.10/bdb.py", line 114, in dispatch_line
    self.user_line(frame)
  File "/usr/local/lib/python3.10/pdb.py", line 253, in user_line
    self.interaction(frame, None)
  File "/home/astro/.local/lib/python3.10/site-packages/IPython/core/debugger.py", line 335, in interaction
    OldPdb.interaction(self, frame, traceback)
  File "/usr/local/lib/python3.10/pdb.py", line 348, in interaction
    self._cmdloop()
  File "/usr/local/lib/python3.10/pdb.py", line 313, in _cmdloop
    self.cmdloop()
  File "/home/astro/.local/lib/python3.10/site-packages/IPython/terminal/debugger.py", line 133, in cmdloop
    ).result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/shortcuts/prompt.py", line 1035, in prompt
    return self.app.run(
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/application/application.py", line 958, in run
    return asyncio.run(coro)
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/application/application.py", line 875, in run_async
    return await _run_async(f)
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/application/application.py", line 732, in _run_async
    self._request_absolute_cursor_position()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/application/application.py", line 1215, in _request_absolute_cursor_position
    self.renderer.request_absolute_cursor_position()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/renderer.py", line 506, in request_absolute_cursor_position
    do_cpr()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/renderer.py", line 495, in do_cpr
    self.output.ask_for_cpr()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/output/vt100.py", line 711, in ask_for_cpr
    self.flush()
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/output/vt100.py", line 704, in flush
    flush_stdout(self.stdout, data)
  File "/usr/local/lib/python3.10/site-packages/prompt_toolkit/output/flush_stdout.py", line 33, in flush_stdout
    stdout.buffer.write(data.encode(stdout.encoding or "utf-8", "replace"))
AttributeError: 'NoneType' object has no attribute 'write'

If you suspect this is an IPython 8.14.0 bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev@python.org

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

What you think should happen instead

We should get into the ipdb debugging shell.

How to reproduce

  1. place the following code in a python callable in an airflow task import ipdb; ipdb.set_trace()
  2. Run the task from the airflow cli `airflow tasks test
  3. See the stack trace
  4. We should not get an error, but be in the ipdb debug shell.

Operating System

Debian 11.3

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else

I reported this exact same bug back in 2.3.4, and it was fixed, but it has come back. Can we please add tests to make sure debuggers like ipdb won't break with future releases?

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

utkarsharma2 commented 1 year ago

@phanikumv can you please assign this to me

utkarsharma2 commented 1 year ago

Replacing import ipdb; ipdb.set_trace() with from IPython.core import debugger; debugger.Pdb().set_trace() worked for me in airflow 2.6.1.

With below dag

from __future__ import annotations
from pprint import pprint

import pendulum

from airflow import DAG
from airflow.decorators import task
from IPython.core import debugger

with DAG(
    dag_id="ipdb",
    schedule=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
) as dag:

    @task(task_id="print_the_context")
    def print_context(ds=None, **kwargs):
        """Print the Airflow context and ds variable from the context."""
        pprint(kwargs)
        debugger.Pdb().set_trace()
        return "Whatever you return gets printed in the logs"

    print_context()
utkarsharma2 commented 1 year ago

Also, the below dag is failing in airflow 2.4.0 and 26.1 for me with the same error. It doesn't look like an airflow issue.

from __future__ import annotations
from pprint import pprint

import pendulum

from airflow import DAG
from airflow.decorators import task
from IPython.core import debugger

with DAG(
    dag_id="ipdb",
    schedule=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
) as dag:

    @task(task_id="print_the_context")
    def print_context(ds=None, **kwargs):
        """Print the Airflow context and ds variable from the context."""
        pprint(kwargs)
        import ipdb; ipdb.set_trace()
        # debugger.Pdb().set_trace()
        return "Whatever you return gets printed in the logs"

    print_context()
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] commented 1 year ago

This issue has been closed because it has not received response from the issue author.

jkryanchou commented 2 months ago

I have met same issue with ipdb