Deadlock with `pytest`+`ddtrace` in py 3.11 that `-p no:ddtrace` does not solve

olivierlefloch commented 1 year ago

Summary of problem

When adding Python 3.11 support to an existing codebase that uses ddtrace, we encounter deadlocks at the end of the test suite (after pytest reports its results) in CI. This does not occur on previous versions of Python.

Furthermore, disabling the ddtrace plugin using -p no:ddtrace does not resolve the issue.

Specifying DD_TRACE_ENABLED=false pytest … does work around the issue, however.

Which version of dd-trace-py are you using?

1.6.3

Which version of pip are you using?

22.3.1

Which libraries and their versions are you using?

datadog==0.44.0
[…]
ddtrace==1.6.3 
[…]
gevent==22.10.2
[…]
pytest==7.2.0

How can we reproduce your problem?

I have not (yet?) put together a minimal reproduction case as there seem to potentially be several issues at play here ; happy to help drill down as appropriate.

What is the result that you get?

Github Actions times out after our self imposted timeout.

What is the result that you expected?

Test suite does not deadlock even with the ddtrace plugin enabled, on Python 3.11,
Specifying -p no:ddtrace fully disables the plugin, including starting a global tracer.

Related issues

mabdinur commented 1 year ago

Hey @olivierlefloch,

Thanks for brining this issue to our attention 😄

Specifying -p no:ddtrace fully disables the plugin, including starting a global tracer.

I believe no:ddtrace does not disable the global tracer, it should only stop the ddtrace pytest plugin from being configured. When you set no:ddtrace do you still see test runs in the Datadog CI product?

Test suite does not deadlock even with the ddtrace plugin enabled, on Python 3.11,

I ran the dd-trace-py gevent tests in circleci with the ddtrace pytest plugin, gevent~=22.10.0, and pytest==7.2.0 but I was not able to reproduce the deadlocks (test run). I was wondering if you could supply a repro case to help us debug further. Also I have a few questions that might help us narrow dow the root cause(s):

Did you notice this issue after upgrading to v1.6.3 or were you experiencing this issue with earlier versions of ddtrace?
What platform are you running your tests on? (ex: windows-latest/macos-12/ubuntu-20.04/...)
Aside from pytest, gevent, and ddtrace are there any libraries you suspect are causing these deadlocks? Can you send us which package versions you're using?
When ddtrace is enabled how frequently does your application/tests deadlock? Do these deadlock occur in the same tests/modules?
Where does you application deadlock? Can you pinpoint this issue to an offending module/line/function? Can you share with us any logs in your application that you think are relevant (or a repo case with sensitive business logic redacted)?
Are you using the asm, ddtrace profiler or dynamic instrumentation in your application? If so, can you try setting DD_APPSEC_ENABLED=false, DD_PROFILING_ENABLED=false and/or DD_DYNAMIC_INSTRUMENTATION_ENABLED=false. This will help us isolate this issue to just the tracer.
Does anything in our gevent, gunicorn, or uwsgi docs stand out to you?
- How are you enabling gevent in your application/tests?
- How are you enabling ddtrace patching (ddtrace-run ..., import ddtrace; ddtrace.patch_all(), or import ddtrace; ddtrace.patch(gevent=True)?
- Are you applying ddtrace patching before gevent?
- If you are using ddtrace-run .... are you setting DD_GEVENT_PATCH_ALL=true? Can you try running your tests with ddtrace gevent support disabled? (ex: set the following environment variable DD_TRACE_GEVENT_ENABLED=false or remove import ddtrace; ddtrace.patch(gevent=True) from your application code)?
Do your tests generate traces? Are you able view these traces in the datadog product? If not, are you running the datadog agent?
We made some changes to how gevent

If you're more comfortable sharing this information privately you can contact our support team here and we can continue this conversation there.

rmoneys commented 1 year ago

@mabdinur,

-p no:ddtrace used to block the Pytest plugin, and also the connection errors reported at the end of a test session when there's no agent to send trace data to. Now it appears to have no effect. With that option Pytest no longer omits the plugin from the list that appears at the start of a session:

% pytest -p no:ddtrace
================================================================================================ test session starts =================================================================================================
platform darwin -- Python 3.11.1, pytest-7.2.0, pluggy-1.0.0
rootdir: /Users/blah/blah/blah, configfile: pyproject.toml
plugins: ddtrace-1.6.4, anyio-3.6.2

This worked as expected with Python 3.10.3 and,

ddtrace==0.56.1
pytest==7.0.1
pluggy==1.0.0

I tried downgrading Pytest, but this combination still fails:

ddtrace==1.6.4
pytest==7.0.1
pluggy==1.0.0

It seems ddtrace==1.6.0 is the earliest release that I can install with Python 3.11.1 and that has the same issue as ddtrace==1.6.4: the plugin cannot be blocked with -p no:ddtrace.

mabdinur commented 1 year ago

Hey,

Sorry for the late reply. ddtrace v1.3.0 introduced pytest-bdd plugin (feature). This plugin starts the global tracer and can be disabled using -p no:ddtrace.pytest_bdd.

Try running pytest with the following command and let us know if this disables the tracer: pytest -p no:ddtrace -p no:ddtrace.pytest_bdd ....

rmoneys commented 1 year ago

Awesome. Thanks!

The plugin no longer loads with,

pytest -p no:ddtrace -p no:ddtrace.pytest_bdd ...

I still get a "failed to send traces to Datadog Agent" error for tests that cover functions wrapped with tracer.wrap, but I can silence those with the environment variable, "DD_TRACE_ENABLED=False".

fredrikaverpil commented 1 year ago

I too have seen these "failed to send traces to Datadog Agent" for some time. The only thing that fixes it is to have the environment variable (DD_TRACE_ENABLED=False) set. This is however not something which is clear from Datadog docs and not really expected behavior when running pytest.

matroscoe commented 1 year ago

I can confirm this is still an issue with Python 3.11.4, ddtrace: 1.18.0 pytest 7.4.0 and with DD_TRACE_ENABLED=False set as an environment variable and -p no:ddtrace -p no:ddtrace.pytest_bdd being passed into the adopts under tool.pytest.ini_options in our pyproject.toml

Also an issue in ddtrace 1.18.1 which was just released.

matroscoe commented 1 year ago

This is still an issue with Python 3.11.6, ddtrace: 1.20.3 with pytest 7.4.2. All environment variables are configured in my previous comment from August 22nd 2023.

The specific error we are seeing:

Unable to export profile: ddtrace.profiling.exporter.ExportError: HTTP upload request failed: [Errno 111] Connection refused. Ignoring.

romainkomorndatadog commented 1 year ago

I work on the pytest plugin, so I figured I'd take a look at this (and #4179, at least temporarily), although it looks like the reports here are explicitly not using the plugin, so we can get it resolved. I may need some help getting in reproducing and/or confirming current symptoms.

@matroscoe , when you say "this is still an issue", you're referring to the deadlock, or to the "failed to send traces" log entries (or both)?

matroscoe commented 1 year ago

@romainkomorndatadog I am specifically referring to the issue with the datadog system still being enabled and attempting to send traces with both the -p no:ddtrace -p no:ddtrace.pytest_bdd being adopted.

romainkomorndatadog commented 1 year ago

@matroscoe , since you're not using (or you're disabling) the ddtrace plugin for pytest, I assume you're installing ddtrace as a dependency for the purpose of using it in the application you're testing?

In the pytest case, the ddtrace module is being imported (regardless of whether or not the plugin's activated), and -p no:ddtrace won't prevent the import from happening. We can modify the plugin's behavior to only import ddtrace when the plugin's enabled and the session starts.

I don't have a repro case yet but I suspect that likely won't resolve the issue if ddtrace is being imported somewhere else in the application, so I think there's more investigating work to be done.

matroscoe commented 1 year ago

@romainkomorndatadog to be clear I think you are right. We have included ddtrace as part of our application (should we not be doing this?). We do not need it or want it to run during pytest (to stop the upload attempt from happening). We would be fine with ddtrace being imported and running just not attempting the upload which tends to confuse users and make it look like there is something wrong when there isn't.

Something like a -p no:ddtrace-upload would even work just to disable the upload part. The main reason I am worried about not having ddtrace attempting to run is we do benchmarking in our CI. We want ddtrace to be deployed and so we want out benchmarks to account for the impact of ddtrace being run with our code.

I think at least my companies use case would be satisfied if we could be provided with an option to specifically disable the upload attempt.

ZStriker19 commented 1 year ago

Hi @matroscoe could you please clarify whether or not setting DD_TRACE_ENABLED to false fixes the issue for you? Some users have reported it does, although it seems like you reported it did not? That seems very strange to me, as when that's set, we avoid calling the spanaggregator on span finish. So no traces get sent. In a way it's equivalent to filtering out all traces from ever getting uploaded.

If that's the case, that's what we'd currently recommend. We can add your request for a plugin flag (e.g. -p no:ddtrace-upload) to mimic this behavior as a feature request.

jlucas91 commented 10 months ago

@romainkomorndatadog To chime in here - outside of the logging concerns raised by others - I cannot seem to disable the ddtrace pytest plugin at all using the solutions recommended here.

Passing in -p no:ddtrace -p no:ddtrace.pytest_bdd does not remove ddtrace from the enabled plugins list. Furthermore, in stack traces within my tests, I see ddtrace. Ex:

/usr/local/lib/python3.12/site-packages/ddtrace/internal/module.py:212: in _exec_module
    self.loader.exec_module(module)

This is on Python 3.12.1, ddtrace 2.3.2, and pytest 7.4.3.

I'm surprised to see ddtrace within the runtime of my tests by default. It's not a behavior I opted into or was clear to me when I installed ddtrace for application tracing.

romainkomorndatadog commented 10 months ago

@jlucas91 , I think there are two (maybe three) separate issues here:

First, I did notice today that it looks like we have a problem with Python 3.10+ that I just created #8039 to look into. It looks like that's what you're reporting in your stacktrace?

Next, just to be clear (and I do realize this is splitting hairs a bit), -p no:ddtrace does disable the ddtrace plugin itself. For example:

% pytest --ddtrace -p no:ddtrace
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --ddtrace
  inifile: /Users/romain.komorn/nondd/flask-shallow/pyproject.toml
  rootdir: /Users/romain.komorn/nondd/flask-shallow

shows that the --ddtrace option doesn't get added by the plugin. It still appears in the output of pytest under the plugins (eg: plugins: ddtrace-2.4.0). ddtrace will continue to appear under the list of plugins unless you -p no: all of the available "sub" plugins (ie, as of time of writing -p no:ddtrace -p no:ddtrace.pytest_bdd -p no:ddtrace.pytest_benchmark).

And lastly, the issue that remains is that the plugin file does import ddtrace, which kicks off some bits of ddtrace machinery no matter what. That said, if you import ddtrace anywhere in your application code, you'll end up with the same behavior (eg: some of ddtrace's functionality will turn get activated by import alone).

I think it may be appropriate in this case for us to lazy-load the module (ie: only import ddtrace if the plugin is enabled). I'm also wondering if folding the functionality of the pytest_bdd and pytest_benchmark plugins into the main ddtrace one would make more sense.

romainkomorndatadog commented 6 months ago

Given the recent changes made for this and https://github.com/DataDog/dd-trace-py/issues/8281#issuecomment-2096705267 , I'm going to mark this as closed as well.

Since there have been multiple different (but related) issues (gevent deadlocks, logging, traces being sent, etc) discussed in both this and #8281 , I'd really appreciate it if, if someone is tempted to reopen this issue, they could instead open a new one with a single topic and details.

DataDog / dd-trace-py