DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
552 stars 414 forks source link

ddtrace-run + gevent + aiohttp/asyncio causes very strange type error in ssl #5989

Closed mmcqd closed 1 year ago

mmcqd commented 1 year ago

Summary of problem

I'm running a gunicorn python server using gevent and ddtrace-run. This is happening in a docker container. When I use the aiohttp library, I get the following type error:

builtins.TypeError: _wrap_bio() argument 'incoming' must be _ssl.MemoryBIO, not _ssl.MemoryBIO

Which version of dd-trace-py are you using?

1.13.3, but I tested and this also occurs on 1.13.4 and 1.9.3. It does not occur on 1.9.2

Which version of pip are you using?

22.3.1

Which libraries and their versions are you using?

`pip freeze` advocate==1.0.0 aiohttp==3.8.4 aiosignal==1.3.1 alembic==1.10.2 alembic-utils==0.7.8 amqp==5.1.1 analytics-python==1.2.9 anyio==3.6.2 appier==1.30.1 argo-workflows==6.4.5 async-generator==1.10 async-timeout==4.0.2 attrs==23.1.0 authzed==0.8.0 backoff==2.2.1 bcrypt==3.2.2 billiard==4.1.0 blinker==1.5 boltons==21.0.0 boto3==1.26.100 botocore==1.29.100 build==0.10.0 bytecode==0.14.2 CacheControl==0.12.11 cachelib==0.9.0 cachetools==5.3.0 cattrs==22.2.0 celery==5.3.0rc1 certifi==2022.12.7 cffi==1.15.1 chardet==4.0.0 charset-normalizer==3.1.0 cleo==2.0.1 click==8.1.3 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.2.0 clickclick==20.10.2 connexion==2.14.2 crashtest==0.4.1 cryptography==40.0.1 dataclasses-json==0.5.7 ddsketch==2.0.4 ddtrace==1.13.4 deepdiff==6.3.0 Deprecated==1.2.14 distlib==0.3.6 docker==6.0.1 dulwich==0.21.5 envier==0.4.0 filelock==3.12.0 Flask==2.1.3 Flask-Caching==2.0.2 Flask-Continuum==0.1.7 Flask-Executor==0.10.0 Flask-JWT-Extended==4.4.4 Flask-Migrate==3.1.0 Flask-SQLAlchemy==2.5.1 flupy==1.2.0 frozenlist==1.3.3 gevent==22.10.2 git-url-parse==1.2.2 google-api==0.1.12 google-api-core==2.11.0 google-api-python-client==2.82.0 google-auth==2.16.3 google-auth-httplib2==0.1.0 googleapis-common-protos==1.59.0 greenlet==2.0.2 grpcio==1.51.3 grpcio-tools==1.51.3 gunicorn==20.1.0 h11==0.14.0 hashids==1.3.1 html5lib==1.1 httpcore==0.17.0 httplib2==0.22.0 httpx==0.24.0 idna==3.4 importlib-metadata==6.0.1 inflection==0.5.1 installer==0.7.0 isodate==0.6.1 itsdangerous==2.1.2 jaraco.classes==3.2.3 jeepney==0.8.0 Jinja2==3.1.2 jmespath==1.0.1 JSON-log-formatter==0.5.2 jsonschema==3.2.0 keyring==23.13.1 kombu==5.3.0b3 kubernetes==12.0.1 langchain==0.0.144 lockfile==0.12.2 lxml==4.9.2 Mako==1.2.4 markdown-it-py==2.2.0 MarkupSafe==2.1.2 marshmallow==3.19.0 marshmallow-enum==1.5.1 mdurl==0.1.2 mock==4.0.3 more-itertools==9.1.0 msgpack==1.0.5 multidict==6.0.4 mypy-extensions==1.0.0 ndg-httpsclient==0.5.1 netifaces==0.11.0 numexpr==2.8.4 numpy==1.24.2 oauthlib==3.2.2 openai==0.27.2 openapi-schema-pydantic==1.2.4 openapi-schema-validator==0.2.3 openapi-spec-validator==0.4.0 opentelemetry-api==1.18.0 ordered-set==4.1.0 orjson==3.8.8 oso==0.27.0 packaging==23.1 parse==1.19.0 pbr==5.11.1 pexpect==4.8.0 pkginfo==1.9.6 platformdirs==3.5.0 poetry==1.4.2 poetry-core==1.5.2 poetry-plugin-export==1.3.1 prance==0.21.8.0 prometheus-client==0.13.1 prompt-toolkit==3.0.38 protobuf==4.21.12 protoc-gen-validate==0.4.2 # Editable install with no version control (protos==0.2.0) -e /app/protos psycogreen==1.0.2 psycopg2-binary==2.9.5 ptyprocess==0.7.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydantic==1.10.7 Pygments==2.14.0 PyJWT==2.6.0 PyNaCl==1.5.0 pyOpenSSL==23.1.0 pyparsing==3.0.9 pyproject_hooks==1.0.0 pyrsistent==0.19.3 python-dateutil==2.8.2 python-dotenv==0.19.2 python-slugify==8.0.1 python3-saml==1.15.0 pytz==2023.2 PyYAML==6.0 rapidfuzz==2.15.1 redis==4.5.3 remote-pdb==2.1.0 requests==2.28.2 requests-file==1.5.1 requests-oauthlib==1.3.1 requests-toolbelt==0.10.1 rich==13.3.3 rsa==4.9 ruamel.yaml==0.17.21 s3transfer==0.6.0 SecretStorage==3.3.3 semantic-version==2.10.0 # Editable install with no version control (semgrep-app==0.1.0) -e /app semver==2.13.0 sentry-sdk==1.17.0 shellingham==1.5.0.post1 simple-salesforce==1.12.3 six==1.16.0 slack-bolt==1.17.0 slack-sdk==3.20.2 sniffio==1.3.0 SQLAlchemy==1.3.24 SQLAlchemy-Continuum==1.3.14 SQLAlchemy-Utils==0.40.0 stripe==2.76.0 structlog==22.3.0 structlog-sentry==2.0.1 swagger-ui-bundle==0.0.9 tenacity==8.2.2 text-unidecode==1.3 tomlkit==0.11.8 tqdm==4.65.0 trove-classifiers==2023.5.24 typing-inspect==0.8.0 typing_extensions==4.5.0 tzdata==2023.3 uritemplate==4.1.1 urllib3==1.26.15 validate-email==1.3 vine==5.0.0 virtualenv==20.21.1 wcwidth==0.2.6 webencodings==0.5.1 websocket-client==1.5.1 Werkzeug==2.2.3 wrapt==1.15.0 xmlsec==1.3.13 xmltodict==0.13.0 yarl==1.8.2 zeep==4.2.1 zipp==3.15.0 zope.event==4.6 zope.interface==6.0

How can we reproduce your problem?

Haven't created a minimal reproduction yet, hopefully what I've provided at the end of this issue is enough to get some insight. I'll work on getting a minimal repro.

What is the result that you get?

This stack trace, when trying to use aiohttp:

builtins.TypeError: _wrap_bio() argument 'incoming' must be _ssl.MemoryBIO, not _ssl.MemoryBIO
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ddtrace/contrib/aiohttp/patch.py", line 54, in connect
    result = await self.__wrapped__.connect(req, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 540, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 901, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
    transport, protocol = await self._create_connection_transport(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1136, in _create_connection_transport
    transport = self._make_ssl_transport(
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 71, in _make_ssl_transport
    ssl_protocol = sslproto.SSLProtocol(
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/sslproto.py", line 326, in __init__
    self._sslobj = self._sslcontext.wrap_bio(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 531, in wrap_bio
    return self.sslobject_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/ssl.py", line 870, in _create
    sslobj = context._wrap_bio(
             ^^^^^^^^^^^^^^^^^^
TypeError: _wrap_bio() argument 'incoming' must be _ssl.MemoryBIO, not _ssl.MemoryBIO

What is the result that you expected?

No type error

What I think might be happening

I know gevent patches the ssl module, and after doing some investigating with pdb, I determined that the _ssl in the type of incoming from the original, unpatched ssl, while the _ssl in the expected type is from the gevent patched ssl module. incoming is created here in asyncio/sslproto.py. It seems like ddtrace/bootstrap/sitecustomize.py loads the asyncio module, which loads sslproto, which loads the pre-gevent ssl module into memory. Then in sitecustomize, asyncio is explicitly not unloaded, since it's in the KEEP_MODULES list. So then later uses of asyncio fail to account for the gevent monkeypatching. I tried removing asyncio from KEEP_MODULES, and the type error went away, but presumably it's there for a reason. If my understanding is correct then I don't know why it wouldn't be affecting everyone, so I suspect it's wrong in some way.

P403n1x87 commented 1 year ago

@mmcqd your analysis is indeed correct. And you're also right that asyncio is in KEEP_MODULES for a reason: during our testing of the module cloning approach for better supporting frameworks like gevent we discovered issues caused by unloading this module. We've already done some work to ensure that we do not force-load asyncio during the initialisation of ddtrace, but I think we still have some dependencies that do force-load it for us. We will investigate how we can avoid that, so that we can safely remove asyncio from KEEP_MODULES.

jbergler commented 1 year ago

Thanks @P403n1x87 for the quick response.

One of the questions we didn't get very far with was why it broke between ddtrace 1.9.2 and 1.9.3 or what we can do since downgrading ddtrace seems to breaks instrumentation for celery.

Do you have any ideas or recommendations for possible workarounds in the meantime while you investigate?

P403n1x87 commented 1 year ago

@jbergler one thing to try in the meantime is the downgrade to the latest 1.8 patch release with the environment variable DD_GEVENT_PATCH_ALL=1

mmcqd commented 1 year ago

@P403n1x87 Unfortunately I don't think we can downgrade to 1.8. Downgrading to 1.9.2 broke our datadog integration with celery. Presumably 1.8 will do the same. We also had DD_GEVENT_PATCH_ALL=1 during this testing.

emmettbutler commented 1 year ago

@mmcqd can you replicate this issue on ddtrace 1.18?

jbergler commented 1 year ago

@emmettbutler thanks for the fixes. @mmcqd and I have been doing a bit more testing and things look pretty good with these changes. Feel free to close the issue.