DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
553 stars 415 forks source link

ddtrace-run 2.7.2 raise free(): invalid pointer #8651

Closed wadinj closed 8 months ago

wadinj commented 8 months ago

Summary of problem

Which version of dd-trace-py are you using?

2.7.2

Which version of pip are you using?

24.0

`pip freeze` colorama==0.4.6 comm==0.2.0 ConfigArgParse==1.7 contourpy==1.2.0 countryinfo==0.1.2 coverage==7.3.4 crashtest==0.4.1 cryptography==42.0.5 cycler==0.12.1 dataclasses-json==0.6.4 ddsketch==2.0.4 ddtrace==2.7.1 debugpy==1.8.0 decorator==5.1.1 deepdiff==6.7.1 defusedxml==0.7.1 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 dparse==0.6.4b0 duckdb==0.10.0 dulwich==0.21.7 envier==0.5.0 et-xmlfile==1.1.0 execnet==2.0.2 executing==2.0.1 fakeredis==2.21.1 fastapi==0.110.0 fastapi-cache2==0.2.1 fastjsonschema==2.19.0 filelock==3.13.1 Flask==3.0.0 Flask-Cors==4.0.0 Flask-Login==0.6.3 fonttools==4.47.0 fqdn==1.5.1 freezegun==1.4.0 frictionless==4.40.8 frozenlist==1.4.1 fsspec==2024.2.0 gevent==23.9.1 geventhttpclient==2.0.11 google_trans_new_tmp==1.2.2 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.2 httpx==0.27.0 identify==2.5.33 idna==3.6 importlib-metadata==6.11.0 iniconfig==2.0.0 installer==0.7.0 ipykernel==6.29.3 ipython==8.22.1 isodate==0.6.1 isoduration==20.11.0 isort==5.13.2 itsdangerous==2.1.2 ixbrl-viewer==1.4.18 jaraco.classes==3.3.1 jedi==0.19.1 jeepney==0.8.0 Jinja2==3.1.3 jmespath==1.0.1 json5==0.9.14 jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 jupyter-events==0.9.0 jupyter-lsp==2.2.2 jupyter_client==8.6.0 jupyter_core==5.5.1 jupyter_server==2.12.1 jupyter_server_terminals==0.5.0 jupyterlab==4.1.1 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.2 keyring==24.3.1 kiwisolver==1.4.5 kombu==5.3.4 locust==2.24.0 lxml==4.9.4 lz4==4.3.3 Mako==1.3.0 mando==0.7.1 markdown-it-py==3.0.0 marko==2.0.2 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.8.2 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 mistune==3.0.2 more-itertools==10.2.0 msgpack==1.0.7 multidict==6.0.4 multimethod==1.10 mypy==1.8.0 mypy-extensions==1.0.0 nbclient==0.9.0 nbconvert==7.13.1 nbformat==5.9.2 nest-asyncio==1.5.8 nodeenv==1.8.0 notebook==7.1.1 notebook_shim==0.2.3 numpy==1.26.2 openpyxl==3.1.0 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-instrumentation-boto==0.41b0 opentelemetry-instrumentation-botocore==0.41b0 opentelemetry-instrumentation-celery==0.41b0 opentelemetry-instrumentation-dbapi==0.41b0 opentelemetry-instrumentation-fastapi==0.41b0 opentelemetry-instrumentation-psycopg2==0.41b0 opentelemetry-instrumentation-redis==0.41b0 opentelemetry-instrumentation-sqlalchemy==0.41b0 opentelemetry-propagator-aws-xray==1.0.1 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 ordered-set==4.1.0 overrides==7.4.0 packaging==23.1 pandas==2.2.1 pandera==0.18.0 pandocfilters==1.5.0 parso==0.8.3 pathspec==0.12.1 pbr==6.0.0 pendulum==3.0.0 petl==1.7.14 pexpect==4.9.0 pillow==10.2.0 Pint==0.22 pkginfo==1.10.0 platformdirs==4.1.0 pluggy==1.3.0 poetry==1.8.1 poetry-core==1.9.0 poetry-plugin-export==1.6.0 polars==0.20.14 pprofile==2.1.0 pre-commit==3.6.2 prometheus-client==0.19.0 prompt-toolkit==3.0.43 protobuf==4.25.1 psutil==5.9.8 psycopg2-binary==2.9.9 ptyprocess==0.7.0 pure-eval==0.2.2 py-expression-eval==0.3.14 pyarrow==15.0.0 pycountry==22.3.5 pycparser==2.21 pydantic==1.10.14 pydantic-to-typescript==1.0.10 pyfiglet==1.0.2 Pygments==2.17.2 PyJWT==2.8.0 pylint==3.1.0 pyparsing==3.1.1 pyproject_hooks==1.0.0 pytest==8.0.2 pytest-alembic==0.11.0 pytest-asyncio==0.23.5 pytest-cov==4.1.0 pytest-deadfixtures==2.2.1 pytest-env==1.1.3 pytest-mock==3.12.0 pytest-order==1.2.0 pytest-rerunfailures==13.0 pytest-xdist==3.5.0 python-box==7.1.1 python-dateutil==2.8.2 python-dotenv==1.0.1 python-json-logger==2.0.7 python-multipart==0.0.9 python-slugify==8.0.4 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 radon==6.0.1 rapidfuzz==3.6.2 redis==4.6.0 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 requests-mock==1.11.0 requests-toolbelt==1.0.0 rfc3339-validator==0.1.4 rfc3986==2.0.0 rfc3986-validator==0.1.1 rich==13.7.1 roundrobin==0.0.4 rpds-py==0.15.2 ruamel.yaml==0.18.5 ruamel.yaml.clib==0.2.8 s3fs==2024.2.0 s3transfer==0.10.0 safety==3.0.1 safety-schemas==0.0.2 scipy==1.12.0 seaborn==0.13.2 SecretStorage==3.3.3 segment-analytics-python==2.3.2 Send2Trash==1.8.2 shellingham==1.5.4 simpleeval==0.9.13 six==1.16.0 sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve==2.5 SQLAlchemy==2.0.28 sqlalchemy-json==0.7.0 SQLAlchemy-Utils==0.41.1 sqlparse==0.4.4 stack-data==0.6.3 starlette==0.36.3 starlette-context==0.3.6 stevedore==5.1.0 stringcase==1.2.0 structlog==24.1.0 tabulate==0.9.0 tenacity==8.2.3 terminado==0.18.0 text-unidecode==1.3 tinycss2==1.2.1 toml-sort==0.23.1 tomlkit==0.12.3 tornado==6.4 tqdm==4.66.2 traitlets==5.14.0 trove-classifiers==2024.3.3 typeguard==4.1.5 typer==0.9.0 types-aiofiles==23.2.0.20240106 types-filelock==3.2.7 types-pyOpenSSL==23.3.0.0 types-python-dateutil==2.8.19.20240106 types-python-slugify==8.0.2.20240127 types-PyYAML==6.0.12.12 types-redis==4.6.0.20240218 types-requests==2.31.0.20240218 types-setuptools==68.2.0.1 typing-inspect==0.9.0 typing_extensions==4.9.0 tzdata==2023.3 uri-template==1.3.0 urllib3==2.0.7 uvicorn==0.27.1 validators==0.22.0 vine==5.1.0 virtualenv==20.25.0 watchdog==4.0.0 wcwidth==0.2.12 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 Werkzeug==3.0.1 wrapt==1.16.0 xenon==0.9.1 XlsxWriter==3.2.0 xmltodict==0.13.0 yarl==1.9.4 zipp==3.17.0 zope.event==5.0 zope.interface==6.1

How can we reproduce your problem?

In our case we run a fastapi with ddtrace under uvicorn: poetry run ddtrace-run uvicorn --host 0.0.0.0 --port 15000 --workers 2 api:api

What is the result that you get?

free(): invalid pointer
free(): invalid pointer

The process is unable to start - only log is free(): invalid pointer

What is the result that you expected?

Downgrading to 2.7.1 works again - seems like 2.7.2 raise an invalid pointer

sanchda commented 8 months ago

:wave: Hi @wadinj, thank you for the report!

Investigating now.

sanchda commented 8 months ago

Quick question: is profiling enabled in your configuration?

sanchda commented 8 months ago

Unfortunately, I wasn't able to reproduce this behavior for a simple uvicorn application on 2.7.2 (with/without profiling enabled). Some more diagnostic information will be useful. If you are able to produce and share a coredump, then it might be easiest to just do that (if you'd like to keep your corefile private, reach out to our support team and let them know you're working with me on a github issue--they'll allow you to share secret data in confidence).

It can be difficult to get and propagate corefiles in many configurations. I've found that using libSegFault.so is the easiest way to get insight into this category of error. Unfortunately, the glibc package maintainers have removed this helpful tool, so we need to get it manually.

I have a script here for getting libSegFault.so for your architecture. I threw this together just now, so hopefully it works. This script will extract libSegFault.so to a standard location, such as /tmp/libsegfault_deb/extracted/lib/x86_64-linux-gnu/libSegFault.so (it prints the exact path).

After you have libSegFault.so, you'll want to add the following to your environment:

LD_PRELOAD=<path_to_libSegFault.so>
SEGFAULT_SIGNALS="abrt segv"

When your application throws the invalid free error, it will print a dump to stderr. If you could share this dump (either here, or through our support system), we can work on remediating this issue ASAP. Thank you!

wllmtrng commented 8 months ago

This happens for me too in gcp and cloud run using fastapi. Here's my dockerfile:

# https://github.com/tiangolo/uvicorn-gunicorn-docker
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.10

# Disable python bytecode
ENV PYTHONDONTWRITEBYTECODE 1

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PYTHONPATH "${PYTHONPATH}:/app"

ENV PORT 80

# Install Poetry
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python && \
    cd /usr/local/bin && \
    ln -s /opt/poetry/bin/poetry && \
    poetry config virtualenvs.create false

# Install private requirements authenticated as the user who is logged in
RUN mkdir -p ~/.ssh && \
    chmod 0600 ~/.ssh && \
    ssh-keyscan github.com > ~/.ssh/known_hosts

#Copy App code to container
COPY . .

# Install app and dependencies in container, mounting your
# Github secret so the container can clone our private
# repositories that hold packages.
RUN --mount=type=ssh poetry install

# DataDog APM dependencies
# pin ddtrace because 2.7.2 caused pointer failures
RUN pip install --target /dd_tracer/python/ ddtrace
COPY --from=datadog/serverless-init:1 /datadog-init /dd_tracer/datadog-init

# Entry point for running serverless datadog agent
ENTRYPOINT ["/dd_tracer/datadog-init"]

# We need to run the app with ddtrace-run to get the APM data.
# Reference to start.sh: https://github.com/tiangolo/uvicorn-gunicorn-docker/blob/master/docker-images/start.sh
CMD ["/dd_tracer/python/bin/ddtrace-run", "/start.sh"]
sanchda commented 8 months ago

Thank you for the report, @wllmtrng

I've been unable to reproduce the issue. Are you able to run your application with the instructions I posted here and report back (or send a note through our support channels) on the output?

Thank you!

sentinel-riley commented 8 months ago

I'm having the same issue with gunicorn gthread workers, I set the profiling config variables mentioned in the 2.7.2 notes, then I get a new error: ddup_init() called 2 times as well as the same free(): invalid pointer error.

sanchda commented 8 months ago

Thank you for the info, @sentinel-riley!

Just to be clear: when you use 2.7.2 without the Profiling configuration mentioned in the release notes, you do not get the free(): invalid pointer error. Is that correct?

sentinel-riley commented 8 months ago

@sanchda No, sorry. With 2.7.2 and the profiling configurations not set, I do get the free(): inavlid pointer error. When I set those configurations, I get the ddup_init() called 2 times error as well.

sanchda commented 8 months ago

@sentinel-riley , gotcha--thank you. So just to repeat it back one more time.

  1. No matter what, you get a free(): invalid pointer error.
  2. If you also use the profiling configuration from the release notes, you get ddup_init() called 2 times in addition to the free() error.

Do I have it right?

sanchda commented 8 months ago

@sentinel-riley, could I ask you to run your application with some additional instrumentation?

Run the script here. This will download libSegFault.so (from glibc) to /tmp/libsegfault_deb/extracted/lib/x86_64-linux-gnu/libSegFault.so

Add the following environment variables to your service:

LD_PRELOAD=/tmp/libsegfault_deb/extracted/lib/x86_64-linux-gnu/libSegFault.so
SEGFAULT_SIGNALS="abrt segv"

When you get the free() error, you'll get a dump to stderr. It'll have a few sections like Register dump: and Backtrace: (I don't care about Memory map:.

If you you're able to add that instrumentation and report back with the info (either here or through Datadog support), I think we can fix this issue in very short order. 🙏

sanchda commented 8 months ago

Hi everyone,

Just a heads up. We've found the cause for this and are working on a fix. Thank you to everyone who assisted in the diagnostics!