DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
519 stars 407 forks source link

span data (maybe from celery) filling encoder buffer #5799

Closed Priya997 closed 10 months ago

Priya997 commented 1 year ago

Summary of problem

Something breaks internally in ddtrace when the string size is larger than the default buffer size which is 8MB.

Which version of dd-trace-py are you using?

1.11.0

Which version of pip are you using?

pip 23.0.1

Which libraries and their versions are you using?

alabaster==0.7.13
amqp==5.1.1
analytics-python==1.4.0
asana==2.0.0
asgiref==3.6.0
astroid==2.15.2
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.2.0
auth0-python==4.1.0
autopep8==2.0.2
Babel==2.12.1
backoff==1.10.0
bandit==1.7.5
bcrypt==4.0.1
beautifulsoup4==4.12.1
billiard==3.6.4.0
black==23.3.0
bleach==6.0.0
boltons==21.0.0
boto3==1.26.108
botocore==1.29.108
boxsdk==3.7.0
bracex==2.3.post1
bytecode==0.14.1
cached-property==1.5.2
cachetools==5.3.0
cairocffi==1.5.0
CairoSVG==2.7.0
cattrs==22.2.0
cbor2==5.4.6
celery==5.2.7
celery-singleton==0.3.1
certifi==2022.12.7
cffi==1.15.1
cfgv==3.3.1
chardet==5.1.0
charset-normalizer==3.1.0
cleo==2.0.1
click==8.1.3
click-didyoumean==0.3.0
click-option-group==0.5.5
click-plugins==1.1.1
click-repl==0.2.0
colorama==0.4.6
colorful==0.5.5
commonmark==0.9.1
confluent-kafka==2.1.0
contextlib2==21.6.0
contourpy==1.0.7
coreapi==2.3.3
coreschema==0.0.4
cotyledon==1.7.3
coverage==7.2.3
crashtest==0.4.1
cryptography==38.0.4
cssselect2==0.7.0
cycler==0.11.0
dataclasses-json==0.5.7
datadog==0.45.0
DateTimeRange==1.2.0
ddsketch==2.0.4
ddtrace==1.11.1
defusedxml==0.7.1
Deprecated==1.2.13
deprecation==2.1.0
dill==0.3.6
distlib==0.3.6
dj-database-url==1.3.0
Django==3.2.9
django-anymail==9.1
django-appconf==1.0.5
django-auditlog==2.2.2
django-celery-email==3.0.0
django-cors-headers==3.14.0
django-dirtyfields==1.9.2
django-dotenv==1.4.2
django-enumchoicefield==3.0.1
django-extensions==3.2.1
django-fernet-fields==0.6
django-hosts==5.2
django-linear-migrations==2.7.0
django-localflavor==3.1
django-log-request-id==2.1.0
django-mock-queries==2.1.7
django-pgtrigger==4.6.0
django-pgviews==0.5.7
django-pgviews-redux==0.9.2
django-phonenumber-field==7.0.2
django-prettyjson==0.4.1
django-redis==5.2.0
django-rich==1.5.0
django-scim2==0.19.0
django-silk==5.0.3
django-storages==1.13.2
django-stubs==1.8.0
django-stubs-ext==0.8.0
djangorestframework==3.14.0
djangorestframework-simplejwt==5.2.2
djangorestframework-stubs==1.4.0
dnspython==2.2.1
docutils==0.18.1
drf-nested-routers==0.93.4
drf-orjson-renderer==1.7.1
drf-spectacular==0.26.1
drf-typed-views==0.3.0
dry-rest-permissions==0.1.10
envier==0.4.0
et-xmlfile==1.1.0
eventbusk @ git+https://github.com/Airbase/eventbusk.git@38991561aa27a2d2bc3e779cacb03a35848ca330
execnet==1.9.0
expiringdict==1.2.2
face==22.0.0
Faker==15.3.4
fakeredis==2.10.3
fcm-django==0.3.10
filelock==3.11.0
filestack-python==3.5.0
flake8==6.0.0
flake8-bandit==4.1.1
fonttools==4.39.3
freezegun==1.2.2
frozenlist==1.3.3
fuzzywuzzy==0.18.0
gevent==22.10.2
gitdb==4.0.10
GitPython==3.1.31
glom==22.1.0
google-auth==2.17.2
gprof2dot==2022.7.29
graphviz==0.20.1
greenlet==2.0.2
gunicorn==20.1.0
hellosign-python-sdk==4.0.0
hiredis==2.2.2
html2text==2020.1.16
html5lib==1.1
hypothesis==6.71.0
icdiff==2.0.6
identify==2.5.22
idna==3.4
imagesize==1.4.1
importlib-metadata==6.0.1
inflection==0.5.1
iniconfig==2.0.0
isodate==0.6.1
isort==5.12.0
itypes==1.2.0
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
jsonschema==4.17.3
kiwisolver==1.4.4
kombu==5.2.4
launchdarkly-server-sdk==7.5.1
lazy-object-proxy==1.9.0
Levenshtein==0.20.9
lob==4.5.4
lxml==4.9.2
markdown-it-py==2.2.0
MarkupSafe==2.1.2
marshmallow==3.19.0
marshmallow-enum==1.5.1
matplotlib==3.6.2
mbstrdecoder==1.1.2
mccabe==0.7.0
mdit-py-plugins==0.3.5
mdurl==0.1.2
MergeHRISClient @ git+https://github.com/Airbase/merge-hris-python.git@b5bc9e8dde0aa4caeb0bfb85d2bc106a4feb4218
mock==5.0.1
model-bakery==1.11.0
monotonic==1.6
multidict==6.0.4
munch==2.5.0
mypy==0.991
mypy-extensions==1.0.0
myst-parser==1.0.0
nodeenv==1.7.0
numpy==1.24.2
oauthlib==3.2.2
openpyxl==3.1.2
opensearch-dsl==2.1.0
opensearch-py==2.2.0
opentelemetry-api==1.17.0
orjson==3.8.10
packaging==23.1
pandas==2.0.0
parameterized==0.9.0
paramiko==3.1.0
pathspec==0.11.1
pbr==5.11.1
pdfkit==1.0.0
peewee==3.16.0
phonenumberslite==8.13.9
pikepdf==6.2.4
Pillow==9.5.0
plaid-python==7.4.0
platformdirs==3.2.0
pluggy==1.0.0
pprintpp==0.4.0
pre-commit==3.2.2
prettyprinter==0.18.0
prompt-toolkit==3.0.38
protobuf==4.22.3
psutil==5.9.4
psycopg2-binary==2.9.6
pubnub==7.1.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.10.0
pycparser==2.21
pycryptodomex==3.17
pydantic==1.9.0
pydash==5.1.2
pyfcm==1.4.9
pyflakes==3.0.1
Pygments==2.15.0
PyJWT==2.6.0
pylint==2.17.2
pylint-celery==0.3
pylint-django==2.5.3
pylint-plugin-utils==0.7
PyNaCl==1.5.0
pyOpenSSL==22.1.0
pyparsing==3.0.9
pypd==1.1.0
pyphen==0.14.0
pyRFC3339==1.1
pyrsistent==0.19.3
pysftp==0.2.9
pytest==7.2.2
pytest-cov==4.0.0
pytest-django==4.5.2
pytest-icdiff==0.6
pytest-instafail==0.5.0
pytest-picked==0.4.6
pytest-schema==0.1.1
pytest-socket==0.6.0
pytest-split==0.8.1
pytest-sugar==0.9.7
pytest-timeout==2.1.0
pytest-xdist==3.2.1
python-dateutil==2.8.2
python-decouple==3.8
python-json-logger==2.0.7
python-Levenshtein==0.20.9
python-lsp-jsonrpc==1.0.0
python-magic==0.4.27
python-quickbooks==0.7.4
python-stdnum==1.18
pytz==2023.3
pyxero==0.9.2
PyYAML==6.0
rapidfuzz==2.15.1
rauth==0.7.3
recommonmark==0.7.1
redis==4.4.0
requests==2.28.2
requests-file==1.5.1
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
responses==0.23.1
rest-framework-simplejwt==0.0.2
returns==0.19.0
rich==12.6.0
rsa==4.9
ruamel.yaml==0.17.21
ruff==0.0.261
s3transfer==0.6.0
schema==0.7.5
scikit-learn==1.2.2
scim2-filter-parser==0.5.0
scipy==1.10.1
semgrep==1.17.1
semver==2.13.0
sentry-sdk==1.19.1
setproctitle==1.3.2
setuptools-scm==7.1.0
shellcheck-py==0.9.0.2
simplejson==3.19.1
six==1.16.0
sklearn==0.0
slackclient==2.9.4
sly==0.5
smmap==5.0.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.4
Sphinx==6.1.3
sphinx-rtd-theme==1.2.0
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
spyne==2.14.0
sqlparse==0.4.3
standardjson==0.3.1
stevedore==5.0.0
tabulate==0.9.0
taskipy==1.10.4
tblib==1.7.0
tenacity==8.2.2
termcolor==2.2.0
threadpoolctl==3.1.0
tinycss2==1.1.1
toml==0.10.2
tomli==2.0.1
tomlkit==0.11.7
tqdm==4.64.1
trafaret==2.0.2
typepy==1.3.0
types-bleach==6.0.0.2
types-click==7.1.8
types-cryptography==3.3.23.2
types-DateTimeRange==2.0.0.5
types-Markdown==3.4.2.7
types-paramiko==3.0.0.7
types-pyOpenSSL==23.1.0.2
types-python-dateutil==2.8.19.12
types-pytz==2023.3.0.0
types-PyYAML==6.0.12.9
types-redis==4.5.4.1
types-requests==2.28.11.17
types-simplejson==3.18.0.3
types-tabulate==0.9.0.2
types-toml==0.10.8.6
types-urllib3==1.26.25.10
typing-inspect==0.8.0
typing_extensions==4.5.0
tzdata==2023.3
ujson==5.7.0
uritemplate==4.1.1
urllib3==1.26.15
veryfi==3.3.0
vine==5.0.0
virtualenv==20.21.0
Wand==0.6.11
wcmatch==8.4.1
wcwidth==0.2.6
WeasyPrint==52.5
webencodings==0.5.1
whitenoise==6.4.0
workos==1.22.0
wrapt==1.15.0
XlsxWriter==3.0.9
xmltodict==0.13.0
yarl==1.8.2
zeep==4.2.1
zipp==3.15.0
zope.event==4.6
zope.interface==6.0

How can we reproduce your problem?

There is not yet a clear pattern of what is producing this, but my gut feeling is that ddtrace tries to serialize a very long SQL query or string and it causes something to break internally.

What is the result that you get?

File "ddtrace/internal/_encoding.pyx", line 693, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 694, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 700, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 697, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 482, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 486, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 509, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 490, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 477, in ddtrace.internal._encoding.MsgpackEncoderBase._pack_trace
  File "ddtrace/internal/_encoding.pyx", line 751, in ddtrace.internal._encoding.MsgpackEncoderV05.pack_span
  File "ddtrace/internal/_encoding.pyx", line 703, in ddtrace.internal._encoding.MsgpackEncoderV05._pack_string
  File "ddtrace/internal/_encoding.pyx", line 184, in ddtrace.internal._encoding.StringTable._index
  File "ddtrace/internal/_encoding.pyx", line 253, in ddtrace.internal._encoding.MsgpackStringTable.insert
ValueError: Cannot insert '636215c4-1f09-4911-a9fc-21fc733bf06f': string table is full (current size: 8388610, max size: 8388608).

What is the result that you expected?

Investigation I did

The root cause according to me is:

The following code snippet is from cdef insert() which is in internals/_encoding.pyx

if len(string) > self.max_size:
            string = "<dropped string of length %d because it's too long (max allowed length %d)>" % (
                len(string), self.max_size
            )

if self.pk.length + len(string) > self.max_size:
    raise ValueError(
        "Cannot insert '%s': string table is full (current size: %d, max size: %d)." % (
            string, self.pk.length, self.max_size
        )
    )

According to my assumption, the length of the string is less than the max size in the first if condition, so the string is not dropped. And in the second if condition when we add the existing string length with self.pk.length which is 6. We exceed the max size and end up throwing the value error.

self.pk.length = MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE and DEF MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE = 6

Priya997 commented 1 year ago

@P403n1x87 Raised a PR here

emmettbutler commented 1 year ago

@Priya997 thanks for the issue report! Let's see if we can figure out how to reproduce it reliably.

I think this is happening because no individual span attribute string is larger than the maximum size of the string table, but the combined size of all attributes is. That's a tricky case to handle, because we need to provide clear feedback to the user about what's causing the overflow. In this case there isn't one particular string that's causing the overflow - instead it's the combined size of all of them.

What details can you share about the application in which you're seeing this error?

Have you considered adjusting the DD_TRACE_WRITER_BUFFER_SIZE_BYTES environment variable to make the buffer large enough to contain all of the span attributes?

Priya997 commented 1 year ago

@emmettbutler Yes, after increasing the DD_TRACE_WRITER_BUFFER_SIZE_BYTES to 10MB. It's not failing in our case anymore.

What details can you share about the application in which you're seeing this error?

On this, we are tracking the trace of celery queues, which is resulting in this error. After a task is executed successfully we call this method trace_after_publish after executing the task successfully.

emmettbutler commented 1 year ago

Thanks @Priya997. I think that since increasing the buffer size worked for you, we can close this issue. Do you agree?

Priya997 commented 1 year ago

@emmettbutler We started facing this again. Can we look for a long-term fix? Increasing the buffer size every time doesn't sound feasible.

Priya997 commented 1 year ago

@emmettbutler Can you please check this?

emmettbutler commented 1 year ago

@Priya997 sure, no problem. What failure mode are you experiencing now? Do you have a traceback you can share?

gbadnmo commented 1 year ago

Piggybacking here as I'm not sure if my issue is a different one.

ValueError: Cannot insert https://graph.facebook.com/v17.0/<id>/users?access_<redacted>&appsecret_proof=<redacted>&payload=<redacted url encoded str 699 chars long>...
  File "ddtrace/internal/_encoding.pyx", line 481, in ddtrace.internal._encoding.MsgpackEncoderBase._pack_trace
  File "ddtrace/internal/_encoding.pyx", line 803, in ddtrace.internal._encoding.MsgpackEncoderV05.pack_span
  File "ddtrace/internal/_encoding.pyx", line 743, in ddtrace.internal._encoding.MsgpackEncoderV05._pack_string
  File "ddtrace/internal/_encoding.pyx", line 186, in ddtrace.internal._encoding.StringTable._index
  File "ddtrace/internal/_encoding.pyx", line 255, in ddtrace.internal._encoding.MsgpackStringTable.insert
RuntimeError: failed to pack span: <Span(id=<id>,trace_id=<id>,parent_id=<id>,name=requests.request)>. Exception: Cannot insert 'https://graph.facebook.com/v17.0/<id>/users?access_<redacted>&appsecret_proof=<redacted>&payload=<redacted url encoded str 699 chars long>...
  File "ddtrace/span.py", line 542, in __exit__
    self.finish()
  File "ddtrace/span.py", line 265, in finish
    self._finish_ns(time_ns())
  File "ddtrace/span.py", line 278, in _finish_ns
    cb(self)
  File "ddtrace/tracer.py", line 813, in _on_span_finish
    p.on_span_finish(span)
  File "ddtrace/internal/processor/trace.py", line 249, in on_span_finish
    self._writer.write(spans)
  File "ddtrace/internal/writer/writer.py", line 355, in write
    self._write_with_client(client, spans=spans)
  File "ddtrace/internal/writer/writer.py", line 377, in _write_with_client
    client.encoder.put(spans)
  File "ddtrace/internal/_encoding.pyx", line 733, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 734, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 740, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 737, in ddtrace.internal._encoding.MsgpackEncoderV05.put
  File "ddtrace/internal/_encoding.pyx", line 491, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 495, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 518, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 499, in ddtrace.internal._encoding.MsgpackEncoderBase.put
  File "ddtrace/internal/_encoding.pyx", line 483, in ddtrace.internal._encoding.MsgpackEncoderBase._pack_trace

The payload is obviously very large

emmettbutler commented 1 year ago

@gbadnmo Have you considered adjusting the DD_TRACE_WRITER_BUFFER_SIZE_BYTES environment variable to make the buffer large enough to contain all of the span attributes?

gbadnmo commented 1 year ago

@emmettbutler Trying with double the buffer size. Not sure if it'll work though since the payload basically looks like abuse of query params in the facebook sdk.

emmettbutler commented 10 months ago

I'm closing this due to inactivity. Please reopen, or consider opening a new ticket, if the issue persists.