DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
548 stars 412 forks source link

dd-trace exceptions polluting stdout even though trace agent is healthy #3380

Closed rexledesma closed 2 years ago

rexledesma commented 2 years ago

Followed https://docs.datadoghq.com/agent/kubernetes/apm/?tab=ipport to setup DD_AGENT_HOST. The datadog agent is deployed through Helm. When inspecting the agent status using agent status, everything is fine.

However, in the application, exceptions are emitted every so often, indicating that the traces failed to upload. But when looking at datadog, the traces are coming through.

Datadog Helm values (following https://docs.datadoghq.com/agent/kubernetes/?tab=helm):

datadog:
  apiKeyExistingSecret: datadog-api-key-token
  apm:
    enabled: true
  dogstatsd:
    useHostPort: true
  logs:
    containerCollectAll: true
    enabled: true
  networkMonitoring:
    enabled: true
  processAgent:
    enabled: true
    processCollection: true

Which version of dd-trace-py are you using?

# pip freeze | grep ddtrace
ddtrace==0.59.0

Which version of pip are you using?

# pip --version
pip 21.0.1 from /usr/local/lib/python3.8/site-packages/pip (python 3.8)

Which version of the libraries are you using?

# pip freeze
alembic==1.6.5
amqp==5.0.9
aniso8601==7.0.0
anyio==3.5.0
asgiref==3.5.0
attrs==21.4.0
Authlib==0.15.5
bleach==4.1.0
boto3==1.21.13
botocore==1.24.13
cachetools==5.0.0
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.12
click==8.0.4
colorama==0.4.4
coloredlogs==14.0
croniter==1.3.4
cryptography==36.0.1
datadog==0.44.0
ddtrace==0.59.0
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.13
docstring-parser==0.13
entrypoints==0.4
gevent==21.12.0
gevent-websocket==0.10.1
google-auth==2.6.0
gql==2.0.0
graphene==2.1.9
graphql-core==2.3.2
graphql-relay==2.0.1
greenlet==1.1.2
grpcio==1.44.0
grpcio-health-checking==1.43.0
h11==0.12.0
httpcore==0.14.7
httptools==0.3.0
httpx==0.22.0
humanfriendly==10.0
idna==3.3
importlib-resources==5.4.0
ipython-genutils==0.2.0
isodate==0.6.1
itsdangerous==2.1.0
Jinja2==2.11.3
jmespath==0.10.0
jsonschema==4.4.0
jupyter-core==4.9.2
kombu==5.2.3
kubernetes==23.3.0
lxml==4.6.5
Mako==1.1.6
MarkupSafe==2.0.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.1.3
oauthlib==3.2.0
packaging==21.3
pandocfilters==1.5.0
passlib==1.7.4
pendulum==2.1.2
pep562==1.1
promise==2.3
prompt-toolkit==3.0.28
protobuf==3.19.4
psycopg2-binary==2.9.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
Pygments==2.11.2
pyparsing==3.0.7
pyrsistent==0.18.1
python-dateutil==2.8.2
python-dotenv==0.19.2
python-editor==1.0.4
python-multipart==0.0.5
python3-saml==1.14.0
pytz==2021.3
pytzdata==2020.1
PyYAML==6.0
questionary==1.10.0
redis==4.1.4
requests==2.27.1
requests-oauthlib==1.3.1
rfc3986==1.5.0
rsa==4.8
Rx==1.6.1
s3transfer==0.5.2
shellingham==1.4.0
six==1.16.0
sniffio==1.2.0
SQLAlchemy==1.4.31
starlette==0.18.0
structlog==21.5.0
tabulate==0.8.9
tenacity==8.0.1
testpath==0.6.0
toposort==1.7
tqdm==4.63.0
traitlets==5.1.1
typer==0.4.0
typing-compat==0.1.0
typing-extensions==4.1.1
urllib3==1.26.8
uvicorn==0.17.5
uvloop==0.16.0
validators==0.18.2
vine==5.0.0
watchdog==2.1.6
watchgod==0.7
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.3.1
websockets==10.2
wrapt==1.13.3
xmlsec==1.3.12
zipp==3.7.0
zope.event==4.5.0
zope.interface==5.4.0

How can we reproduce your problem?

What is the result that you get?

{"event": "failed to send traces to Datadog Agent at http://10.0.2.215:8126/v0.4/traces", "logger": "ddtrace.internal.writer", "level": "error", "timestamp": "2022-03-05T01:01:29.034335Z", "exception": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 407, in __call__\n    result = fn(*args, **kwargs)\n  File \"/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py\", line 412, in _send_payload\n    response = self._put(payload, headers)\n  File \"/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py\", line 372, in _put\n    resp = compat.get_connection_response(conn)\n  File \"/usr/local/lib/python3.8/site-packages/ddtrace/internal/compat.py\", line 222, in get_connection_response\n    return conn.getresponse()\n  File \"/usr/local/lib/python3.8/http/client.py\", line 1347, in getresponse\n    response.begin()\n  File \"/usr/local/lib/python3.8/http/client.py\", line 307, in begin\n    version, status, reason = self._read_status()\n  File \"/usr/local/lib/python3.8/http/client.py\", line 268, in _read_status\n    line = str(self.fp.readline(_MAXLINE + 1), \"iso-8859-1\")\n  File \"/usr/local/lib/python3.8/socket.py\", line 669, in readinto\n    return self._sock.recv_into(b)\nsocket.timeout: timed out\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py\", line 525, in flush_queue\n    self._retry_upload(self._send_payload, encoded, n_traces)\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 404, in __call__\n    do = self.iter(retry_state=retry_state)\n  File \"/usr/local/lib/python3.8/site-packages/tenacity/__init__.py\", line 361, in iter\n    raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7f008aca7460 state=finished raised timeout>]"}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py", line 412, in _send_payload
    response = self._put(payload, headers)
  File "/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py", line 372, in _put
    resp = compat.get_connection_response(conn)
  File "/usr/local/lib/python3.8/site-packages/ddtrace/internal/compat.py", line 222, in get_connection_response
    return conn.getresponse()
  File "/usr/local/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/local/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ddtrace/internal/writer.py", line 525, in flush_queue
    self._retry_upload(self._send_payload, encoded, n_traces)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 361, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f008aca7460 state=finished raised timeout>]

What is the result that you expected?

Exceptions should not be printed onto stdout.

brettlangdon commented 2 years ago

hey @rexledesma thank for you reaching out, and I am sorry you are experiencing this issue.

Would you be willing to open this as an issue with support@datadoghq.com ?

In order to help investigate we will want to take a look into your account (for example, we can get some trace agent health metrics).

Kyle-Verhoog commented 2 years ago

@rexledesma I'm going to close out the issue since we haven't heard anything in a while. If you're still seeing the issue please reopen the issue 🙂

elina-israyelyan commented 11 months ago

Still have the issue for version 1.6.3