DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
546 stars 410 forks source link

OverflowError: (34, 'Numerical result out of range') in dd-trace 2.1.5 #7516

Closed kzap closed 11 months ago

kzap commented 11 months ago

Summary of problem

After upgrading from 2.1.4 to 2.1.5 we suddenly found our application crashing at certain times due to this new error we had never seen. This OverflowError seems to be from the appsec code https://github.com/search?q=repo%3ADataDog%2Fdd-trace-py%20PyExc_OverflowError&type=code and there were changes in the new version regarding this code

We do not use SimpleJSON in our code, so not sure how we are being affected or if it is this change.

Which version of dd-trace-py are you using?

2.1.5

Which version of pip are you using?

# pip --version
pip 23.3.1 from /venv/lib/python3.11/site-packages/pip (python 3.11)

Which libraries and their versions are you using?

`pip freeze` ``` aiohttp==3.8.6 aiosignal==1.3.1 amqp==5.1.1 apispec==6.3.0 apispec-webframeworks==0.5.2 async-timeout==4.0.3 attrs==23.1.0 authy==2.2.6 bcrypt==4.0.1 beautifulsoup4==4.12.2 billiard==4.1.0 bleach==6.1.0 bleach-allowlist==1.0.3 blinker==1.6.3 botcloning @ PRIVATE boto3==1.28.79 botocore==1.31.79 bytecode==0.15.1 cachetools==5.3.2 cattrs==23.1.2 celery==5.3.4 certifi==2023.7.22 cffi==1.16.0 charon-sdk @ PRIVATE charset-normalizer==3.3.1 click==8.1.7 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.3.0 contextlib2==21.6.0 cryptography==41.0.5 cssselect==1.2.0 cssutils==2.9.0 datadog==0.47.0 ddsketch==2.0.4 ddtrace==2.1.5 defusedxml==0.7.1 Deprecated==1.2.14 dnspython==1.16.0 elastic-site-search==2.1.1 envier==0.4.0 expiringdict==1.2.2 feedfinder2==0.0.4 feedparser==6.0.10 filelock==3.13.1 Flask==3.0.0 Flask-Cors==4.0.0 Flask-JWT-Extended==4.5.3 Flask-Limiter==3.5.0 Flask-Pydantic==0.11.0 frozenlist==1.4.0 genghis @ PRIVATE gevent==23.9.1 google-api-core==2.12.0 google-auth==2.23.4 google-cloud-core==2.3.3 google-cloud-translate==3.12.1 googleapis-common-protos==1.61.0 greenlet==3.0.1 grpcio==1.59.2 grpcio-status==1.59.2 gunicorn==21.2.0 idna==3.4 importlib-metadata==6.8.0 importlib-resources==6.1.0 itsdangerous==2.1.2 jieba3k==0.35.1 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 kombu==5.3.2 langcodes==3.3.0 launchdarkly-server-sdk==9.0.1 limits==3.6.0 lxml==4.9.3 Markdown==3.5.1 markdown-it-py==3.0.0 markdownify==0.11.6 MarkupSafe==2.1.3 marshmallow==3.20.1 marshmallow-dataclass==8.6.0 marshmallow-mongoengine==0.31.2 mdurl==0.1.2 mongoengine==0.21.0 multidict==6.0.4 mypy-extensions==1.0.0 ndg-httpsclient==0.5.1 newspaper3k==0.2.8 nltk==3.8.1 numpy==1.26.1 oauthlib==3.2.2 openai==0.28.1 opensearch-py==2.3.2 opentelemetry-api==1.20.0 ordered-set==4.1.0 packaging==23.2 phonenumbers==8.13.24 Pillow==10.1.0 pluggy==0.13.1 premailer==3.10.0 prompt-toolkit==3.0.39 proto-plus==1.22.3 protobuf==4.24.4 psycopg2-binary==2.9.9 pusher==3.3.2 pyap==0.3.1 pyasn1==0.5.0 pyasn1-modules==0.3.0 pybreaker==1.0.2 pycparser==2.21 pycryptodome==3.19.0 pydantic==1.10.13 Pygments==2.16.1 PyJWT==2.8.0 pymongo==3.12.0 PyNaCl==1.5.0 pyOpenSSL==23.3.0 pyRFC3339==1.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-http-client==3.3.7 python-json-logger==2.0.7 python-redis-lock==4.0.0 pytz==2023.3.post1 PyYAML==6.0.1 rapidfuzz==3.5.2 redis==5.0.1 regex==2023.10.3 requests==2.31.0 requests-file==1.5.1 requests-oauthlib @ PRIVATE rich==13.6.0 rsa==4.9 s3transfer==0.7.0 schema==0.7.5 scipy==1.11.3 semver==3.0.2 sendgrid==6.10.0 sentry-sdk==1.34.0 sgmllib3k==1.0.0 six==1.16.0 smooch==5.20.0 soupsieve==2.5 SQLAlchemy==1.3.3 starkbank-ecdsa==2.2.0 tenacity==8.2.3 tiktoken==0.5.1 tinysegmenter==0.3 tld==0.13 tldextract==5.0.1 tqdm==4.66.1 twilio==6.35.1 typing-inspect==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 unicodecsv==0.14.1 Unidecode==1.3.7 urllib3==2.0.7 validators==0.22.0 vine==5.0.0 wcwidth==0.2.9 webencodings==0.5.1 Werkzeug==3.0.1 wrapt==1.15.0 XlsxWriter==3.1.9 xmltodict==0.13.0 yarl==1.9.2 zenpy==2.0.41 zipp==3.17.0 zope.event==5.0 zope.interface==6.1 ```

How can we reproduce your problem?

It does not happen every time but sometimes on certain instances of ours a pod starts up and crashes with this error.

What is the result that you get?

Our python 3.11 gunicorn flask application crashes with the following:

OverflowError: (34, 'Numerical result out of range')

and

Exception ignored in forksafe hook <bound method Tracer._child_after_fork of <ddtrace.tracer.Tracer object at 0x7f36a9970210>>
Screenshot 2023-11-07 at 7 36 34 PM

What is the result that you expected?

Dont crash our application

iherasymenko commented 11 months ago

Rolling back to 2.1.4 didn't help, the issue still happened after.

juanjux commented 11 months ago

Does not seems to be related to appsec. Do you have IAST enabled? If not the code snippet which is from PyBind11 (a lib we use for IAST) should not be even loaded. It's for sure not related to the traceback code.

Could you try disabling IAST (if previously enabled) and appsec?

mabdinur commented 11 months ago

Hi @kzap,

Are you setting DD_TRACE_WRITER_INTERVAL_SECONDS or manually overriding ddtrace.internal.writer.writr.AgentWriter.RETRY_ATTEMPTS?

Since interval=1 and RETRY_ATTEMPTS=3 by default, it's unclear how this equation could overflow: 0.618 * self.interval / (1.618 ** self.RETRY_ATTEMPTS) / 2.

Can you provide a reproduction?

iherasymenko commented 11 months ago

@mabdinur No, we don't touch DD_TRACE_WRITER_INTERVAL_SECONDS or ddtrace.internal.writer.writr.AgentWriter.RETRY_ATTEMPTS in any way.

@juanjux no DD_IAST_ENABLED is not enabled either.

The issue happened intermittently and only several times. We don't have steps to reproduce it.

cc: @kzap

mabdinur commented 11 months ago

From the information shared in this issue it's not clear what could be going wrong.

@kzap Can you open a support ticket here: https://help.datadoghq.com/hc/en-us/requests/new. Our support team will be able to provide next steps