DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
551 stars 413 forks source link

ddtrace with FastAPI + OpenAI causes FastAPI container to instantly die with no logs or stack trace #6701

Closed RogerThomas closed 12 months ago

RogerThomas commented 1 year ago

Summary of problem

I have been trying to debug an issue in a FastAPI/OpenAI application where, upon calls to OpenAI (Creating embeddings for example), or for example the following

@router.get("/ask-debug")
def ask_debug():
    from langchain.llms import OpenAI
    llm = OpenAI(model_name="gpt-3.5-turbo-0613")
    answer = llm("What is the capital of Ireland")
    return answer

The container instantly dies around 90/95% of the time. The odd time it works but most of the time, the container just instantly gets killed the second the call to OpenAI is made.

No logs, no stack trace no nothing.

It took me a while to figure this out but, when I disabled ddtrace, e.g. replaced the

ENTRYPOINT ["ddtrace-run"]
CMD ["uvicorn", "app.main:app" , "--host", "0.0.0.0", "--port", "80", "--log-level", "trace"]

With

ENTRYPOINT ["uvicorn"]
CMD ["app.main:app" , "--host", "0.0.0.0", "--port", "80", "--log-level", "trace"]

Everything works fine, I can confirm this further because, my final solution was to re-enable ddtrace-run but with this set in my Dockerfile

ENV DD_PATCH_MODULES=openai:false

So there is something very strange happening within the dd tracer when interacting with OpenAI

Which version of dd-trace-py are you using?

ddtrace==1.15.1

Which version of pip are you using?

pip 23.2.1

Which libraries and their versions are you using?

`pip freeze` aiohttp==3.8.5 aiosignal==1.3.1 annotated-types==0.5.0 anyio==3.7.1 astroid==2.15.6 async-timeout==4.0.2 attrs==23.1.0 beautifulsoup4==4.12.2 black==23.3.0 blinker==1.6.2 boto3==1.28.2 botocore==1.31.19 Brotli==1.0.9 build==0.10.0 bytecode==0.14.2 cattrs==23.1.2 certifi==2023.7.22 cffi==1.15.1 cfgv==3.3.1 chardet==3.0.4 charset-normalizer==3.2.0 click==8.1.6 ConfigArgParse==1.7 cryptography==41.0.3 dataclasses-json==0.5.14 ddsketch==2.0.4 ddtrace==1.18.0 debugpy==1.6.5 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.7 docopt==0.6.2 envier==0.4.0 et-xmlfile==1.1.0 faiss-cpu==1.7.4 fastapi==0.99.0 filelock==3.12.2 fire==0.5.0 flake8==6.1.0 Flask==2.3.2 Flask-BasicAuth==0.2.0 Flask-Cors==4.0.0 frozenlist==1.4.0 fsspec==2023.6.0 gevent==23.7.0 geventhttpclient==2.0.9 googletrans==4.0.0rc1 greenlet==2.0.2 h11==0.9.0 h2==3.2.0 hpack==3.0.0 hstspreload==2023.1.1 httpcore==0.9.1 httpx==0.13.3 huggingface-hub==0.16.4 hyperframe==5.2.0 identify==2.5.26 idna==2.10 importlib-metadata==6.8.0 iniconfig==2.0.0 isort==5.12.0 itsdangerous==2.1.2 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 langchain==0.0.264 langsmith==0.0.19 lazy-object-proxy==1.9.0 locust==2.15.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 mccabe==0.7.0 mdurl==0.1.2 moto==4.1.12 mpmath==1.3.0 msgpack==1.0.5 multidict==6.0.4 mypy==1.4.1 mypy-extensions==1.0.0 mypy-protobuf==3.4.0 ndg-httpsclient==0.5.1 networkx==3.1 nlpcloud==1.1.44 nltk==3.8.1 nodeenv==1.8.0 numexpr==2.8.5 numpy==1.25.2 openai==0.23.0 openapi-schema-pydantic==1.2.4 openpyxl==3.1.2 opentelemetry-api==1.19.0 orjson==3.9.5 packaging==23.1 pandas==2.0.3 pandas-stubs==2.0.3.230814 pathspec==0.11.2 Pillow==10.0.0 pip-tools==7.2.0 platformdirs==3.10.0 pluggy==1.2.0 pre-commit==3.3.3 protobuf==4.23.4 psutil==5.9.5 pusher==3.3.2 pyasn1==0.5.0 pycodestyle==2.11.0 pycparser==2.21 pycryptodome==3.18.0 pydantic==1.10.12 pydantic_core==2.6.0 pyflakes==3.1.0 Pygments==2.16.1 pylint==2.17.4 pylint-protobuf==0.20.2 PyNaCl==1.5.0 pyOpenSSL==23.2.0 pyproject_hooks==1.0.0 pytest==7.4.0 pytest-mock==3.11.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 pytz==2023.3 PyYAML==6.0.1 pyzmq==25.1.0 redis==4.6.0 regex==2023.8.8 requests==2.31.0 requests-toolbelt==1.0.0 responses==0.23.3 rfc3986==1.5.0 rich==13.5.2 roundrobin==0.0.4 s3transfer==0.6.1 safetensors==0.3.2 scikit-learn==1.3.0 scipy==1.11.2 sentence-transformers==2.2.2 sentencepiece==0.1.99 simpleaichat==0.2.0 six==1.16.0 sniffio==1.3.0 soupsieve==2.4.1 SQLAlchemy==2.0.19 sse-starlette==1.6.1 starlette==0.27.0 stomp.py==8.1.0 sympy==1.12 tenacity==8.2.2 termcolor==2.3.0 threadpoolctl==3.2.0 tiktoken==0.4.0 tokenizers==0.13.3 tomlkit==0.12.1 torch==2.0.1 torchvision==0.15.2 tqdm==4.66.1 transformers==4.31.0 types-protobuf==4.23.0.2 types-pytz==2023.3.0.1 types-PyYAML==6.0.12.11 types-requests==2.31.0.1 types-urllib3==1.26.25.14 typing-inspect==0.9.0 typing_extensions==4.7.1 tzdata==2023.3 urllib3==1.26.16 uvicorn==0.22.0 virtualenv==20.24.2 websocket-client==1.6.1 Werkzeug==2.3.6 wrapt==1.15.0 xmltodict==0.13.0 yarl==1.9.2 zipp==3.16.2 zope.event==5.0 zope.interface==6.0

How can we reproduce your problem?

Create a fast api application with ddtracer running using openai, implement the endpoint I mentioned above.

What is the result that you get?

Container dies instantly with no logs or stack trace

What is the result that you expected?

The container does not die

emmettbutler commented 1 year ago

Thank you for reporting this issue, @RogerThomas.

@Yun-Kim I'm putting this on your radar as a subject matter expert in OpenAI.

aalgor commented 1 year ago

I experience the same issue in which setting DD_PROFILING_ENABLED to "true" and making a call with OpenAI client in FastAPI crashes the server without any logs.

The environment is as follows:

pip 23.2.1
ddtrace 1.18.2
Python 3.11.4

pip freeze
aiohttp==3.8.5
aiosignal==1.3.1
amqp==5.1.1
annotated-types==0.5.0
anthropic==0.3.10
antlr4-python3-runtime==4.9.3
anyio==3.7.1
async-timeout==4.0.3
attrs==23.1.0
awscli==1.29.33
beautifulsoup4==4.11.2
billiard==4.1.0
blobfile==2.0.2
botocore==1.31.33
build==0.10.0
bytecode==0.14.2
CacheControl==0.12.14
cattrs==23.1.2
celery==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
cleo==2.0.1
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.4
contourpy==1.1.0
crashtest==0.4.1
cryptography==41.0.3
cycler==0.11.0
datadog==0.44.0
ddsketch==2.0.4
ddtrace==1.18.2
Deprecated==1.2.14
deprecation==2.1.0
distlib==0.3.7
distro==1.8.0
dnspython==1.16.0
docker==6.1.3
docutils==0.16
dulwich==0.21.5
elasticsearch==7.13.4
envier==0.4.0
et-xmlfile==1.1.0
expiringdict==1.2.2
fastapi==0.100.1
filelock==3.12.2
fonttools==4.42.1
frozenlist==1.4.0
fsspec==2023.6.0
gevent==22.10.2
greenlet==2.0.2
h11==0.14.0
h2==4.1.0
hpack==4.0.0
html5lib==1.1
httpcore==0.16.3
httpx==0.23.3
huggingface-hub==0.16.4
hydra-core==1.3.2
hypercorn==0.14.4
hyperframe==6.0.1
idna==3.4
importlib-metadata==6.8.0
iniconfig==2.0.0
installer==0.7.0
jaraco.classes==3.3.0
jeepney==0.8.0
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
keyring==23.13.1
kiwisolver==1.4.5
kombu==5.3.1
launchdarkly-server-sdk==8.1.5
lockfile==0.12.2
lxml==4.9.3
Markdown==3.4.4
markdownify==0.11.6
matplotlib==3.7.2
more-itertools==10.1.0
msgpack==1.0.5
multidict==6.0.4
nltk==3.8.1
numpy==1.25.2
omegaconf==2.3.0
openai==0.27.9
openpyxl==3.1.2
opentelemetry-api==1.19.0
packaging==23.1
pandas==2.0.3
pandas-stubs==2.0.3.230814
pastel==0.2.1
pexpect==4.8.0
phonenumbers==8.13.19
Pillow==10.0.0
pkginfo==1.9.6
platformdirs==3.10.0
plotly==5.16.1
pluggy==1.2.0
poethepoet==0.13.1
poetry==1.4.2
poetry-core==1.5.2
poetry-plugin-export==1.3.1
priority==2.0.0
prompt-toolkit==3.0.39
protobuf==4.24.1
ptyprocess==0.7.0
py==1.11.0
pyasn1==0.5.0
pybreaker==1.0.2
pycparser==2.21
pycryptodomex==3.18.0
pydantic==2.3.0
pydantic-settings==2.0.3
pydantic_core==2.6.3
pymongo==3.12.0
pymongo-stubs==0.2.0
pyparsing==3.0.9
pyproject_hooks==1.0.0
pyRFC3339==1.1
pytest==7.1.2
python-dateutil==2.8.2
python-dotenv==1.0.0
python-json-logger==2.0.7
pytz==2023.3
PyYAML==6.0.1
rabbitmq==0.2.0
rapidfuzz==2.15.1
redis==4.6.0
referencing==0.30.2
regex==2023.8.8
requests==2.31.0
requests-aws4auth==0.9
requests-toolbelt==0.10.1
rfc3986==1.5.0
rpds-py==0.9.2
rsa==4.7.2
s3transfer==0.6.2
safetensors==0.3.3
scikit-learn==1.3.0
scipy==1.11.2
SecretStorage==3.3.3
semver==3.0.1
sentry-sdk==1.29.2
shellingham==1.5.3
six==1.16.0
sniffio==1.3.0
soupsieve==2.4.1
starlette==0.27.0
tenacity==8.2.1
testcontainers==3.7.1
threadpoolctl==3.2.0
tiktoken==0.2.0
tokenizers==0.13.3
tomli==2.0.1
tomlkit==0.12.1
tqdm==4.66.1
transformers==4.32.0
trove-classifiers==2023.8.7
types-beautifulsoup4==4.12.0.6
types-html5lib==1.1.11.15
types-Markdown==3.4.2.10
types-pyOpenSSL==23.2.0.2
types-pytz==2023.3.0.1
types-redis==4.6.0.5
types-requests==2.31.0.2
types-urllib3==1.26.25.14
typing_extensions==4.7.1
tzdata==2023.3
uritools==4.0.1
urlextract==1.8.0
urllib3==1.26.16
uvicorn==0.20.0
uvloop==0.17.0
vine==5.0.0
virtualenv==20.21.1
wcwidth==0.2.6
webencodings==0.5.1
websocket-client==1.6.2
wrapt==1.15.0
wsproto==1.2.0
xmltodict==0.13.0
yarl==1.9.2
zipp==3.16.2
zope.event==5.0
zope.interface==6.0
Yun-Kim commented 1 year ago

Hi @RogerThomas @aalgor, thanks for reaching out! Trying this with uvicorn+fastapi+openai (but outside of a container) for me I wasn't able to reproduce any errors. Could you provide a sample dockerfile that could help reproduce the issue? Furthermore, do you have any stdout logs from the --log-level trace container run command?

Just to clarify, @RogerThomas were you running this with profiling enabled, i.e. DD_PROFILING_ENABLED=True?

aalgor commented 1 year ago

Hi @Yun-Kim, I have provided a sample Dockerfile and logs in this support ticket

sanchda commented 1 year ago

Hi @RogerThomas, @aalgor. I work on the Profiling side of the product.

I have a request for more information which will help us figure out how this defect might relate to Profiling. Posting here rather than through our Support organization, since it'd be valuable if other folks running into this problem could also return with some diagnostic information.

The steps I'm posting are only valid for glibc-based containers (basically, anything not based on alpine in some way). If that's a problem, I can try something else. The purpose of these steps are to reveal the crash site in native code, which in turn will help us diagnose and triage. I'll do my best to link conversations here to support tickets on our side if anything is in flight.

The first thing I need is for your container image to have a copy of libSegFault.so. This is a part of glibc, which unfortunately was dropped in 2.35 (despite the note, sadly there are few easy-to-deploy tools that provide the same information). You usually can get this by forcing a reinstall of the libc or glibc packages.

... That said, the recent-ish Python images are based on a debian that is on or after the noted glibc removal of libSegFault.so. Accordingly, I've provided compatible builds of this for aarch64 and x86_64

aarch64 x86_64

What I'd like someone to do is:

  1. get libSegFault.so into their container image, making sure it is marked executable if you're downloading it from me.
  2. set LD_PRELOAD=. I don't know what path the file will have on your machine.
  3. run your reproducing application, posting the backtrace here or sharing the output in your ongoing support escalation with Datadog. I'm currently subscribing to the related escalations, so I'll see these notes and annotate the tickets accordingly.

Thank you!

aalgor commented 1 year ago

Hi @sanchda I updated the support ticket with the full log. The head of the logs is the following:

"*** Segmentation fault"
"Register dump:"
"RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000003"
"RDX: 00007f0b92867030 RSI: 0000000000000003 RDI: 0000000000000080"
"RBP: 00007f0b928174b0 R8 : 0000000000000000 R9 : 0000000000a406e8"
"R10: 00007f0b9287a110 R11: 0000000000000008 R12: 00007f0bddf3ea08"
"R13: 00007f0b63936880 R14: 00007f0bddf3ea50 R15: 0000000000000000"
"RSP: 00007f0bdc575460"
"RIP: 0000000000635d9a EFLAGS: 00010202"
"CS: 0033 FS: 0000 GS: 0000"
"Trap: 0000000e Error: 00000004 OldMask: 00000000 CR2: 00000010"
"FPUCW: 0000037f FPUSW: 00000000 TAG: 00000000"
"RIP: 00000000 RDP: 00000000"
"ST(0) 0000 0000000000000000 ST(1) 0000 0000000000000000"
"ST(2) 0000 0000000000000000 ST(3) 0000 0000000000000000"
"ST(4) 0000 0000000000000000 ST(5) 0000 0000000000000000"
"ST(6) 0000 0000000000000000 ST(7) 0000 0000000000000000"
"mxcsr: 1fa0"
"XMM0: 000000000000000000000000cba490b8 XMM1: 000000000000000000000000cba490b8"
"XMM2: 000000000000000000000000cba490b8 XMM3: 000000000000000000000000cba490b8"
"XMM4: 000000000000000000000000cba490b8 XMM5: 000000000000000000000000cba490b8"
"XMM6: 000000000000000000000000cba490b8 XMM7: 000000000000000000000000cba490b8"
"XMM8: 000000000000000000000000cba490b8 XMM9: 000000000000000000000000cba490b8"
"XMM10: 000000000000000000000000cba490b8 XMM11: 000000000000000000000000cba490b8"
"XMM12: 000000000000000000000000cba490b8 XMM13: 000000000000000000000000cba490b8"
"XMM14: 000000000000000000000000cba490b8 XMM15: 000000000000000000000000cba490b8"
"Backtrace:"
"/usr/bin/python3[0x635d9a]"
"/usr/bin/python3(PyFrame_FastToLocalsWithError+0x2a)[0x4d8e27]"
"/usr/bin/python3[0x4d8e3e]"
"/usr/bin/python3(_PyObject_GenericGetAttrWithDict+0x246)[0x54e5f6]"
"/usr/local/lib/python3.11/dist-packages/ddtrace/profiling/collector/_traceback.cpython-311-x86_64-linux-gnu.so(+0x51c9)[0x7f0bdc9bd1c9]"
"/usr/local/lib/python3.11/dist-packages/ddtrace/profiling/collector/_traceback.cpython-311-x86_64-linux-gnu.so(+0x770c)[0x7f0bdc9bf70c]"
"/usr/bin/python3[0x5515fb]"
"/usr/local/lib/python3.11/dist-packages/ddtrace/profiling/collector/stack.cpython-311-x86_64-linux-gnu.so(+0x26d91)[0x7f0bdc99fd91]"
"/usr/bin/python3(_PyObject_MakeTpCall+0x25c)[0x52ceec]"
"/usr/bin/python3(_PyEval_EvalFrameDefault+0x771)[0x539c51]"
"/usr/bin/python3(_PyFunction_Vectorcall+0x173)[0x5624b3]"
"/usr/local/lib/python3.11/dist-packages/ddtrace/profiling/_threading.cpython-311-x86_64-linux-gnu.so(+0xd4af)[0x7f0bdc9e74af]"
"/usr/bin/python3(_PyObject_MakeTpCall+0x25c)[0x52ceec]"
"/usr/bin/python3(_PyEval_EvalFrameDefault+0x771)[0x539c51]"
"/usr/bin/python3[0x581d37]"
"/usr/bin/python3[0x581778]"
"/usr/bin/python3[0x66fa94]"
"/usr/bin/python3[0x63d948]"
"/lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x7f0bde19ab43]"
"/lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x7f0bde22ca00]"
sanchda commented 1 year ago

@aalgor thank you so much for doing that. This is excellent. I'm going to transition your ticket to Profiling now.

sanchda commented 1 year ago

@aalgor, I see to be missing a pyproject.toml in the repro. Is this something you sent along or linked to? Sorry for being daft, I'm trying to catch up on things.

aalgor commented 1 year ago

@sanchda I updated the support ticket with the pyproject.toml file

sanchda commented 1 year ago

@aalgor thank you! Trying it out now.

sanchda commented 1 year ago

@aalgor OK, confirming that I can build the base image based on the info you've sent. I'll post again with repro results, but it looks like I shouldn't need much more hand-holding through this one. Thanks again!

sanchda commented 1 year ago

@aalgor OK, confirming that we can reproduce on this end. We're investigating a similar issue which appears to be related to the (native) thread-safety of the Python interpreter when the interpreterstate is switched between native threads, but not 100% sure on that. It'll take some time to diagnose this further.

I'll try to propose a timeline ~soonish~ but it might not be today. Will at least post an update tomorrow if I don't have a precise timeline. Apologies for the wait, thank you again for the help!

sanchda commented 1 year ago

@aalgor, I said I'd give you an update yesterday and failed to deliver. Apologies for that--this analysis has been a little bit too interesting.

We're at a point where we're fairly confident that this is some kind of race condition with cpython as it switches interpreter states. Whether or not this is a race condition that we could avoid through some means is as-of-yet unknown. I'm still testing some of the causal properties of the issue.

Unfortunately, I don't have an estimated timeline yet. Will try to update again soon.

(none of the analysis or anything is a secret, and ordinarily I'd post it, but there have been a lot of iterations and much of it has to do with staring at machine code and translating it back to causal state in cpython, so I'm omitting that discussion in the interest of time--can drop some details if they have value)

aalgor commented 1 year ago

@sanchda Thanks for the update, I appreciate it.

sanchda commented 1 year ago

Hi everyone,

I think we've been able to isolate this down to a defect in our bytecode wrapping procedure which is used for our OpenAI integration. Although the crash was happening in Profiling, it was caused by invalid stub frames we were adding. Accordingly, PR #7044 fixes this problem.

I'll leave the issue open until we've provided a suitable release for user testing. If it looks like that might take some time (for whatever reason), I'll publish and host some wheelfiles out-of-band for folks to test with.

Thank you for your assistance and patience, everyone!

P403n1x87 commented 1 year ago

Apologies, this was auto-closed by the linked PR getting merged.

sanchda commented 1 year ago

Hi everyone,

This issue should be fixed in the following ddtrace versions:

@RogerThomas , @aalgor--apologies for the delay. Could either/both of you confirm that the fixed versions above address the defect? Thank you!

RogerThomas commented 1 year ago

Hi @sanchda, amazing, thanks for fixing. Unfortunately, our DD deployment is versioned such that we can't update past 1.15.1 at the mo. I guess I could still try it for a little while and just not get any logs, I will give that a bash. THanks

aalgor commented 1 year ago

Hi @sanchda, thank you for the fix but v1.20.3 fails unit tests of our system with the following error:

openai_client.py:5: in <module>
    import openai
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
/usr/local/lib/python3.11/dist-packages/ddtrace/vendor/wrapt/importer.py:177: in _exec_module
    notify_module_loaded(module)
/usr/local/lib/python3.11/dist-packages/ddtrace/vendor/wrapt/decorators.py:470: in _synchronized
    return wrapped(*args, **kwargs)
/usr/local/lib/python3.11/dist-packages/ddtrace/vendor/wrapt/importer.py:136: in notify_module_loaded
    hook(module)
/usr/local/lib/python3.11/dist-packages/ddtrace/_monkey.py:167: in on_import
    imported_module.patch()
/usr/local/lib/python3.11/dist-packages/ddtrace/contrib/openai/patch.py:1[60](....): in patch
    wrap(openai.util.convert_to_openai_object, _patched_convert(openai, integration))
/usr/local/lib/python3.11/dist-packages/ddtrace/internal/wrapping/__init__.py:214: in wrap
    code = wrap_bytecode(wrapper, wrapped)
/usr/local/lib/python3.11/dist-packages/ddtrace/internal/wrapping/__init__.py:127: in wrap_bytecode
    instrs[0:0] = [Instr("MAKE_CELL", CellVar(_), lineno=lineno) for _ in code.co_cellvars]
/usr/local/lib/python3.11/dist-packages/ddtrace/internal/wrapping/__init__.py:127: in <listcomp>
    instrs[0:0] = [Instr("MAKE_CELL", CellVar(_), lineno=lineno) for _ in code.co_cellvars]
/usr/local/lib/python3.11/dist-packages/bytecode/instr.py:534: in __init__
    self._set(name, arg)
/usr/local/lib/python3.11/dist-packages/bytecode/instr.py:[73](....)7: in _set
    self._check_arg(name, opcode, arg)
/usr/local/lib/python3.11/dist-packages/bytecode/instr.py:[80](....)1: in _check_arg
    raise TypeError(
E   TypeError: operation MAKE_CELL argument must be CellVar or FreeVar, got CellVar
maxmetzger commented 1 year ago

I am seeing this bug with v1.20.5 and python 3.11. We are not using OpenAPI but we are using FastAPI.

Traceback (most recent call last):
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/uvicorn/workers.py", line 66, in init_process
    super(UvicornWorker, self).init_process()
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
                ^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
                    ^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
           ^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/3.11.0/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/module.py", line 220, in _exec_module
    self.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/apps/spp_ipa_service/api/component.py", line 8, in <module>
    from ariadne.asgi import GraphQL
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/module.py", line 220, in _exec_module
    self.loader.exec_module(module)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ariadne/__init__.py", line 1, in <module>
    from .enums import (
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/module.py", line 220, in _exec_module
    self.loader.exec_module(module)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ariadne/enums.py", line 18, in <module>
    from graphql.type import GraphQLEnumType, GraphQLNamedType, GraphQLSchema
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/vendor/wrapt/importer.py", line 177, in _exec_module
    notify_module_loaded(module)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/vendor/wrapt/decorators.py", line 470, in _synchronized
    return wrapped(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/vendor/wrapt/importer.py", line 136, in notify_module_loaded
    hook(module)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/_monkey.py", line 167, in on_import
    imported_module.patch()
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/contrib/graphql/patch.py", line 81, in patch
    _update_patching(wrap, module_str, func_name, wrapper)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/contrib/graphql/patch.py", line 113, in _update_patching
    operation(func, wrapper)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/wrapping/__init__.py", line 214, in wrap
    code = wrap_bytecode(wrapper, wrapped)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/wrapping/__init__.py", line 127, in wrap_bytecode
    instrs[0:0] = [Instr("MAKE_CELL", CellVar(_), lineno=lineno) for _ in code.co_cellvars]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/ddtrace/internal/wrapping/__init__.py", line 127, in <listcomp>
    instrs[0:0] = [Instr("MAKE_CELL", CellVar(_), lineno=lineno) for _ in code.co_cellvars]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/bytecode/instr.py", line 536, in __init__
    self._set(name, arg)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/bytecode/instr.py", line 745, in _set
    self._check_arg(name, opcode, arg)
  File "/pyenv/versions/our-application/lib/python3.11/site-packages/bytecode/instr.py", line 809, in _check_arg
    raise TypeError(
TypeError: operation MAKE_CELL argument must be CellVar or FreeVar, got CellVar
mabdinur commented 12 months ago

@maxmetzger Sorry for the confusion but this issue should be fixed in v1.20.6, v2.0.5, v2.1.0

PR: https://github.com/DataDog/dd-trace-py/pull/7151

Feel free to reopen this issue if you're still experiencing this error.

sheluchin commented 9 months ago

I believe we encountered this exact issue using ddtrace==2.5.1.

sanchda commented 9 months ago

I believe we encountered this exact issue using ddtrace==2.5.1.

:wave: Uh oh. Let's dive into this a little bit. Are you using OpenAI with the Datadog OpenAI integration? Which version of python?

I'll follow up with additional diagnostic guidance, depending on your configuration.

sheluchin commented 9 months ago

@sanchda no, we are not using OAI, but we are using FastAPI.

Our observation was that very large requests would eventually cause the server to die without leaving any logs or stacktrace. When we uninstalled ddtrace, the same requests went through successfully without issue.

sanchda commented 9 months ago

@sheluchin

we are not using OAI

Thank you for clarifying, that probably means this is a novel issue. It may or may not be related to FastAPI. The important causal feature of this issue is that we had incorrect bytecode wrapping for OpenAI--it was a load-independent defect.

I've opened #8228 to track this problem.

Quick question: which version of python are you running?

sheluchin commented 9 months ago

@sanchda

Quick question: which version of python are you running?

Python 3.10.10.