DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
547 stars 411 forks source link

ddtrace.profiling.exporter.http.UploadFailed #1355

Closed youssefNM closed 2 years ago

youssefNM commented 4 years ago

Which version of dd-trace-py are you using?

ddtrace[profile]==0.36.1

Which version of the libraries are you using?

ach==0.2
alembic==1.0.5
analytics-python==1.2.9
awscli==1.16.263
beautifulsoup4==4.8.1
boto3==1.9.187
celery==4.4.0rc4
celery-once==3.0.1
datadog==0.31.0
dateparser==0.7.2
ddtrace[profile]==0.36.1
dj-database-url==0.5.0
docutils===0.15.2
eventlet==0.25.1
Flask==1.1.1
flask-cors==3.0.8
flask-marshmallow==0.10.1
Flask-Migrate==2.5.2
flask-restplus==0.13.0
Flask-Script==2.0.6
Flask-SQLAlchemy==2.4.1
flower==0.9.3
fuzzysearch==0.6.2
fuzzywuzzy==0.17.0
google-search-results==1.7.1
gunicorn==20.0.4
intuit-oauth==1.2.2
iso3166==1.0
jinja2==2.10.3
marshmallow==3.2.1
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.20.0
pdfminer-six==20191110
phonenumbers==8.10.22
pillow==6.2.1
plaid-python==3.4.0
psycopg2==2.8.4
pycrypto===2.6.1
pysftp==0.2.9
python-dateutil==2.8.1
python-gnupg==0.4.5
python-json-logger==0.1.11
python-Levenshtein==0.12.0
python-quickbooks==0.8.1
pyxero==0.9.1
pytz==2019.3
requests==2.22.0
sendgrid==6.1.0
sentry-sdk[flask]==0.14.3
simplejson==3.17.0
smartystreets-python-sdk==4.3.0
sqlalchemy-citext==1.3.post0
sqlalchemy-utils==0.35.0
twilio==6.33.1
urllib3[secure]
weasyprint==50
xlsxwriter==1.2.5
pytest-xdist==1.30.0
pdf2image==1.10.0
coverage==5.0b1
coverage-badge==1.0.1
firebase-admin==3.2.0
jsonschema==3.2.0
openpyxl==3.0.2
retrying==1.3.3
werkzeug==0.16.1
requests-cache==0.5.2
zeep==3.4.0
pyotp==2.3.0
python-memcached==1.59
flask-caching==1.8.0
zipcodes==1.1.0
gevent==1.4.0
simple-salesforce==1.0.0

How can we reproduce your problem?

from ddtrace.profiling import Profiler

prof = Profiler()
prof.start()

What is the result that you get?

I get this error occasionally, it seems profiling failed to export the events, though this does not apply for all, since i can see new events from my app in profiling page.

Unable to export 23758 events: ddtrace.profiling.exporter.http.UploadFailed: Unable to upload: ddtrace.profiling.exporter.http.RequestFailed: Error status code received from endpoint: 502: b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body bgcolor="white">\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n'
jd commented 4 years ago

Thank you @youssefNM for the report. As this does not seem to be a problem with the profiler code itself but seems to be related to the infrastructure, could you open a support ticket https://www.datadoghq.com/support/ ?

We'd probably need more info, such as where it's come from and some timestamps to see how/if it matches on our side.

Thank you!

Anto59290 commented 4 years ago

I had the same issue while running a test in a AWS Cloud9 instance (should be easy for you to reproduce). Is this because the server is unreacheable (port closed ?) ?

Tell me if you need more specific data.

jd commented 4 years ago

@Anto59290 what do you mean by the server is unreachable? Our servers are reachable, but they might not be from Cloud9 — I wouldn't know about that. Feel free to open a support ticket if you need help setting this up.

Anto59290 commented 4 years ago

Our servers are reachable, but they might not be from Cloud9 — I wouldn't know about that.

That's what I meant, yes.

Feel free to open a support ticket if you need help setting this up.

IMHO, it is a bit of a shame that the documentation does not provide more details about this. Which port should be open, etc. I know that on other Datadog features more details are given on port/network configuration. I am in trial period, with only a few days left, and I know that with the support it takes a while to get a proper answer, so I think I'll just give up on this one ;)

Thanks for your quick answer by the way.

jd commented 4 years ago

Port 443 to intake.profile.datadoghq.com or intake.profile.datadoghq.eu depending on your site.

kjagiello commented 4 years ago

Port 443 to intake.profile.datadoghq.com or intake.profile.datadoghq.eu depending on your site.

I've sent a ticket about this already, but it seems that the Python setup docs for profiling are missing the mention about DD_SITE. Took me some time to realise that I forgot to set it to the EU host.

Anto59290 commented 4 years ago

Thanks @kjagiello that was indeed my issue. Which lead to the "Unable to export" error (with a 403 error status, not with a 502 like in the initial issue, my bad). If you simply follow the documentation from https://docs.datadoghq.com/fr/tracing/profiling/?tab=python and your account is configured in EU (that is my case), it will not work. After export DD_SITE=datadoghq.eu everything went smoothly. The Java documentation does state it by the way:

Note: With dd-java-agent.jar library versions 0.48+, if your organization is on Datadog EU site, add -Ddd.site=datadoghq.eu or set DD_SITE=datadoghq.eu as environment variable.
youssefNM commented 4 years ago

This is still hitting us, and no useful response from the support team @jd

{"asctime": "2020-05-13 02:01:35,672", "levelname": "ERROR", "name": "ddtrace.profiling.scheduler", "filename": "scheduler.py", "lineno": 42, "dd.trace_id": 0, "dd.span_id": 0, "message": "Unable to export 18026 events: ddtrace.profiling.exporter.http.UploadFailed: Unable to upload: urllib.error.HTTPError: HTTP Error 502: Bad Gateway\n\n", "dd.version": "", "dd.env": "", "dd.service": ""}

It seems to me that Datadog api is rejecting our profiling events maybe due to a rate limit in place (like the one you have with APM traces)

jd commented 4 years ago

@youssefNM I'll check but I don't think our intake returns a 502 error like that in any case.

Do you have anything between the intake endpoint and your application by any chance? A proxy?

youssefNM commented 4 years ago

no, no proxy @jd, our application communicates with the intake endpoint directly. I got a response from the support team about this

Screen Shot 2020-05-20 at 00 05 18

Having that retry logic can help with this issue.

raphaelauv commented 4 years ago

I had the same problem

1) Configuration was missing in my datadog-agent

DD_PROFILING_ENABLED: "true"

2) Missing name of the service of my datadog-agent for ddtrace-run

DD_PROFILING_ENABLED=true DD_SERVICE=my_dd_service_name ddtrace-run myprogram
Kyle-Verhoog commented 2 years ago

Going to close this out due to age, @youssefNM please re-open if the issue is still occurring! Retry logic has been implemented since this issue was opened.