DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
539 stars 408 forks source link

Crash in Celery Worker with DD_DYNAMIC_INSTRUMENTATION_ENABLED=true #7730

Closed thibautd closed 9 months ago

thibautd commented 9 months ago

Summary of problem

When DD_DYNAMIC_INSTRUMENTATION_ENABLED is true, our Celery workers are not able to execute any tasks.

Which version of dd-trace-py are you using?

2.3.0

Which version of pip are you using?

pip 23.3.1, setuptools 68.0.0 and wheel 0.41.3

Which libraries and their versions are you using?

amqp==5.1.1
analytics-python==1.4.0
arabic-reshaper==3.0.0
asgiref==3.5.2 
asn1crypto==1.5.1
attrs==22.1.0
automat==22.10.0
backoff==1.10.0
balena-sdk==14.1.0
bcrypt==4.0.1
billiard==3.6.4.0
boto3==1.26.71
botocore==1.29.71
bytecode==0.14.0
cachetools==5.2.0
cattrs==22.2.0
celery==5.2.7
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==2.1.1
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
constantly==15.1.0
convertdate==2.4.0
cryptography==41.0.4
cssselect2==0.7.0
customerio==2.1
cvxopt==1.3.2
cvxpy==1.3.0
datadog==0.47.0
ddsketch==2.0.4
ddtrace==2.3.0
defusedxml==0.7.1
deprecated==1.2.13
dj-database-url==1.0.0
django==4.1.13
django-admin-autocomplete-filter==0.7.1
django-admin-inline-paginator==0.4.0
django-celery-results==2.5.1
django-cors-headers==3.13.0
django-deprecate-fields==0.1.1
django-extensions==3.2.1
django-filter==22.1
django-ipware==4.0.2
django-multiselectfield==0.1.12
django-postgres-extra==2.0.8
django-storages==1.13.1
django-structlog==4.1.1
django-translated-fields==0.12.0
django-xworkflows==1.0.0
djangorestframework==3.14.0
djangorestframework-csv==2.1.1
djangorestframework-dataclasses==1.2.0
djangorestframework-simplejwt==5.2.2
drf-spectacular==0.26.3
ecos==2.0.12
elasticsearch==5.5.3
elementpath==3.0.2
envier==0.4.0
et-xmlfile==1.1.0
google-api-core==2.11.0
google-api-python-client==2.68.0
google-auth==2.16.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.7.1
googleapis-common-protos==1.58.0
graphviz==0.20.1
gspread==5.7.2
gunicorn==21.2.0
heroku3==5.2.0
hijri-converter==2.2.4
holidays==0.19
html5lib==1.1
httplib2==0.21.0
hyperlink==21.0.0
idna==3.4
importlib-metadata==6.0.1
incremental==22.10.0
inflection==0.5.1
jmespath==1.0.1
jsonschema==4.17.3
kombu==5.2.4
korean-lunar-calendar==0.3.1
llvmlite==0.40.1
lxml==4.9.1
mailchimp-transactional==1.0.50
monotonic==1.6
more-itertools==9.0.0
numba==0.57.1
numpy==1.23.5
oauthlib==3.2.2
odoo-client-lib==1.2.2
openpyxl==3.1.1
opentelemetry-api==1.17.0
oscrypto==1.3.0
osqp==0.6.2.post8
packaging==23.1
pandas==1.5.3
paramiko==2.12.0
pillow==10.0.1
pine-client==0.2.0
prompt-toolkit==3.0.33
protobuf==4.21.10
psycopg2-binary==2.9.5
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pycryptodome==3.16.0
pyhanko==0.15.1
pyhanko-certvalidator==0.19.6
pyjwt==2.6.0
pymeeus==0.5.11
pynacl==1.5.0
pyopenssl==23.2.0
pyparsing==3.0.9
pypdf==3.17.0
pypng==0.20220715.0
pyrsistent==0.19.3
pysaml2==7.4.0
python-bidi==0.4.2
python-dateutil==2.8.2
python3-openid==3.2.0
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
pyyaml==6.0
qdldl==0.1.5.post2
qrcode==7.4.2
reportlab==4.0.4
requests==2.31.0
requests-aws4auth==1.1.2
requests-oauthlib==1.3.1
rest-condition==1.0.3
rsa==4.9
s3transfer==0.6.0
scipy==1.10.0
scs==3.2.2
semver==3.0.1
sentry-sdk==1.29.2
service-identity==21.1.0
setproctitle==1.3.2
six==1.16.0
slack-sdk==3.19.5
social-auth-app-django==5.0.0
social-auth-core==4.3.0
sqlparse==0.4.4
stripe==5.0.0
structlog==22.3.0
structlog-sentry==2.0.0
svglib==1.4.1
tinycss2==1.2.1
twisted==23.10.0
typing-extensions==4.7.1
tzdata==2022.7
tzlocal==4.2
unicodecsv==0.14.1
uritemplate==4.1.1
uritools==4.0.1
urllib3==1.26.18
vine==5.0.0
wcwidth==0.2.5
webencodings==0.5.1
wrapt==1.15.0
xhtml2pdf==0.2.9
xmlschema==2.1.1
xmltodict==0.13.0
xworkflows==1.1.0
zipp==3.15.0
zope-interface==6.0

How can we reproduce your problem?

I don't know :-(

What is the result that you get?

Here is the exception we get in our workers:

{"event": "/app/.heroku/python/lib/python3.11/site-packages/celery/app/trace.py:660: RuntimeWarning: Exception raised outside body: TypeError(\"cannot create weak reference to '_cffi_backend.Lib' object\"):
Traceback (most recent call last):
  File \"/app/.heroku/python/lib/python3.11/site-packages/celery/app/trace.py\", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/sentry_sdk/integrations/celery.py\", line 275, in _inner
    reraise(*exc_info)
  File \"/app/.heroku/python/lib/python3.11/site-packages/sentry_sdk/_compat.py\", line 60, in reraise
    raise value
  File \"/app/.heroku/python/lib/python3.11/site-packages/sentry_sdk/integrations/celery.py\", line 270, in _inner
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/celery/app/trace.py\", line 734, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/contextlib.py\", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File \"/app/apps/ondemand/tasks.py\", line 77, in create_ondemand_label_printings
    create_zpl_prints()
  File \"/app/apps/ondemand/operations.py\", line 422, in create_zpl_prints
    for zpl_printer, schedules in get_printers_schedules_to_execute().items():
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/apps/ondemand/queries.py\", line 274, in get_printers_schedules_to_execute
    for schedule in LabelPrintingSchedule.objects.select_for_update().all():
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/models/query.py\", line 394, in __iter__
    self._fetch_all()
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/models/query.py\", line 1867, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/models/query.py\", line 83, in __iter__
    db = queryset.db
         ^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/models/query.py\", line 1759, in db
    return self._db or router.db_for_write(self.model, **self._hints)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/utils.py\", line 220, in _route_db
    for router in self.routers:
                  ^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/utils/functional.py\", line 57, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/db/utils.py\", line 211, in routers
    router = import_string(r)()
             ^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/utils/module_loading.py\", line 30, in import_string
    return cached_import(module_path, class_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/django/utils/module_loading.py\", line 15, in cached_import
    module = import_module(module_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/importlib/__init__.py\", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"<frozen importlib._bootstrap>\", line 1204, in _gcd_import
  File \"<frozen importlib._bootstrap>\", line 1176, in _find_and_load
  File \"<frozen importlib._bootstrap>\", line 1126, in _find_and_load_unlocked
  File \"<frozen importlib._bootstrap>\", line 241, in _call_with_frames_removed
  File \"<frozen importlib._bootstrap>\", line 1204, in _gcd_import
  File \"<frozen importlib._bootstrap>\", line 1176, in _find_and_load
  File \"<frozen importlib._bootstrap>\", line 1147, in _find_and_load_unlocked
  File \"<frozen importlib._bootstrap>\", line 690, in _load_unlocked
  File \"/app/.heroku/python/lib/python3.11/site-packages/ddtrace/internal/module.py\", line 214, in _exec_module
    callback(module)
  File \"/app/.heroku/python/lib/python3.11/site-packages/ddtrace/internal/module.py\", line 398, in after_import
    self._origin_map[path] = module
    ^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/ddtrace/internal/module.py\", line 383, in _origin_map
    self._om = modules_with_origin(sys.modules.values())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/site-packages/ddtrace/internal/module.py\", line 374, in modules_with_origin
    result = wvdict({str(origin(m)): m for m in modules})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/weakref.py\", line 119, in __init__
    self.update(other, **kw)
  File \"/app/.heroku/python/lib/python3.11/weakref.py\", line 297, in update
    d[key] = KeyedRef(o, self._remove, key)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/.heroku/python/lib/python3.11/weakref.py\", line 348, in __new__
    self = ref.__new__(type, ob, callback)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot create weak reference to '_cffi_backend.Lib' object
P403n1x87 commented 9 months ago

@thibautd Thanks for reporting this. This seems to be due to some dependency importing a module that is not an ordinary ModuleType object, and that does not support weak references. We try to track the lifetime of modules in case they are ever unloaded and later re-loaded, and use a weak reference to prevent holding on to old references, to prevent a potential memory leak. I'm looking into a way of handling non-ModuleType objects whilst keeping the memory safeguards in place. I'll hopefully have a fix for this soon 🤞