Open akuzkohqs opened 2 years ago
Hi @akuzkohqs . This doesn't seem like a DNS error, which would yield an error like: Failed to establish a new connection: [Errno -2] Name or service not known'
.
In this case it seems like a connection is made, but it is "too slow" so the read timeout expires. Feels like a networking issue.
What is the exact check you've ran with curl
? Can you please post the output of curl -sSL -vv profiler.granulate.io
.
By the way, per the log you have submitted, some profiling data was sent:
| 2022-03-29T16:57:06.890+03:00 | [13:57:06] Successfully uploaded profiling data to the server
if it's from this /tmp/gprofiler -cu --token ${GP_PAT} --service-name SERVICE_NAME --disable-pidns-check --perf-mode disabled --no-java --no-python --no-php --no-ruby --nodejs-mode none &
run then it might as well be empty - here you have disabled all profilers, so nothing collects any stacks :sweat_smile:
@Jongy thank you for your attention on this issue. Let me to answer your questions:
curl -v https://profiler.granulate.io || true
and it works all the time. Output will be at the bottom.So, to conclude. Do we need to expect anything but CPU/Memory utilization sent and visualized in gProfiler dashboard when we have --perf-mode disabled (or --perf-mode none)? What could we do in gProfiler settings/options to configure (if needed) timeouts and/or other things?
curl output:
# curl -sSL -vv https://profiler.granulate.io
* Rebuilt URL to: https://profiler.granulate.io/
* Trying 35.170.220.67...
* TCP_NODELAY set
* Connected to profiler.granulate.io (35.170.220.67) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=profiler.granulate.io
* start date: Jan 5 00:00:00 2022 GMT
* expire date: Feb 3 23:59:59 2023 GMT
* subjectAltName: host "profiler.granulate.io" matched cert's "profiler.granulate.io"
* issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x563a305caea0)
> GET / HTTP/1.1
> Host: profiler.granulate.io
> User-Agent: curl/7.52.1
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< date: Wed, 30 Mar 2022 15:04:12 GMT
< content-type: text/html
< content-length: 2862
< server: nginx/1.18.0
< last-modified: Tue, 29 Mar 2022 16:51:17 GMT
< vary: Accept-Encoding
< etag: "62433905-b2e"
< cache-control: no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0
< strict-transport-security: max-age=31536000; includeSubDomains
< accept-ranges: bytes
<
<!doctype html><html lang="en"><head><meta charset="utf-8"/><link rel="shortcut icon" href="/favicon.ico"/><meta name="viewport" content="minimum-scale=1,initial-scale=1,width=device-width,shrink-to-fit=no"/><meta name="theme-color" content="#000000"/><link rel="manifest" href="/manifest.json"/><title>gProfiler</title><script src="/env-config.js"></script><script>!function(e,t,a,n,g){e[n]=e[n]||[],e[n].push({"gtm.start":(new Date).getTime(),event:"gtm.js"});var m=t.getElementsByTagName(a)[0],r=t.createElement(a);r.async=!0,r.src="https://www.googletagmanager.com/gtm.js?id=GTM-TNXVMCR",m.parentNode.insertBefore(r,m)}(window,document,"script","dataLayer")</script><link href="/static/css/2.ef749903.chunk.css" rel="stylesheet"><link href="/static/css/main.123bc548.chunk.css" rel="stylesheet"></head><body style="background-color:#f0f2f5"><noscript>You need to enable JavaScript to run this app.</noscript><noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-TNXVMCR" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript><div id="root"></div><script src="https://www.google.com/recaptcha/api.js?onload=onloadCallback&render=explicit" async defer="defer"></script><script>!function(e){function r(r){for(var n,l,f=r[0],i=r[1],p=r[2],c=0,s=[];c<f.length;c++)l=f[c],Object.prototype.hasOwnProperty.call(o,l)&&o[l]&&s.push(o[l][0]),o[l]=0;for(n in i)Object.prototype.hasOwnProperty.call(i,n)&&(e[n]=i[n]);for(a&&a(r);s.length;)s.shift()();return u.push.apply(u,p||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,f=1;f<t.length;f++){var i=t[f];0!==o[i]&&(n=!1)}n&&(u.splice(r--,1),e=l(l.s=t[0]))}return e}var n={},o={1:0},u=[];function l(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,l),t.l=!0,t.exports}l.m=e,l.c=n,l.d=function(e,r,t){l.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},l.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Ob* Curl_http_done: called premature == 0
* Connection #0 to host profiler.granulate.io left intact
ject.defineProperty(e,"__esModule",{value:!0})},l.t=function(e,r){if(1&r&&(e=l(e)),8&r)return e;if(4&r&&"object"==typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(l.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)l.d(t,n,function(r){return e[r]}.bind(null,n));return t},l.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return l.d(r,"a",r),r},l.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},l.p="/";var f=this.webpackJsonpgprofiler=this.webpackJsonpgprofiler||[],i=f.push.bind(f);f.push=r,f=f.slice();for(var p=0;p<f.length;p++)r(f[p]);var a=i;t()}([])</script><script src="/static/js/2.6c3cd4ec.chunk.js"></script><script src="/static/js/main.d26e6baa.chunk.js"></script></body></html>
Thanks for the details.
"Successfully uploaded profiling data to the server" is confusing. It always followed with "Failed sending logs to server" with errors from above.
These are actually 2 separate steps - and 2 different endpoints: one endpoint for the profiles (which reports success) and another separate endpoint for the logs, which reports a failure. While being unexpected, I do not deem it related to the issue of no profiling data being displayed in the UI (as I said - these are 2 different endpoints)
So, to conclude. Do we need to expect anything but CPU/Memory utilization sent and visualized in gProfiler dashboard when we have --perf-mode disabled (or --perf-mode none)? What could we do in gProfiler settings/options to configure (if needed) timeouts and/or other things?
Yes, definitely - you should be seeing data even with --perf-mode=none
, it just means that we don't run perf
, but we still run py-spy, async-profiler etc.
What I meant in my comment is that if you run --no-perf
and --no-java --no-python --no-php --no-ruby
, then, well, no profilers are left :sweat_smile: and in that case you will most likely get no data because no profiler actually ran. This should be an error configuration and I will fix the commandline interface to raise if all profilers are disabled.
So the issue that remains is that you're seeing "Successfully uploaded profiling data" but not seeing any data on the server. Just making sure - if running on Fargate with --perf-mode=none
, then other profilers for e.g Java/Python must be active, and actually identify processes (otherwise no profiling data is collected, because perf is responsible for collecting data from the entire system).
Could you please share a full log from a profiling session (i.e, run the profiling without -c
for a single session)? You can post it here (possibly redacted if you prefer not to share all log lines), or you can also send it to me privately at yonatan dot goldschmidt at granulate dot io.
I want to ensure that some profilers have actually been run and correlate that with the "Successfully uploaded data" message.
@Jongy ...things are changing fast.. Our team gave up with gProfiler ;( I was able to test things before got "Server error: Invalid API key" as they disabled account.
Btw, thank you for implementing fool-proof protection when all profilers are disabled.
What I found is that Java profiler was giving us a problem like that with /dev/kmsg error (it won't work in Fargate):
[06:40:00] Initialized JavaProfiler (frequency: 11hz, duration: 60s)
[06:40:00] Failed to start kernel messages listener. Profilee error monitoring not available. (Do you have permission to read /dev/kmsg?)
Traceback (most recent call last):
File "gprofiler/kernel_messages.py", line 30, in get_kernel_messages_provider
File "granulate_utils/linux/devkmsg.py", line 25, in __init__
FileNotFoundError: [Errno 2] No such file or directory: '/dev/kmsg'
[06:40:00] Initialized PythonEbpfProfiler (frequency: 11hz, duration: 60s)
[06:40:00] Python eBPF profiler initialization failed
[06:40:00] Initialized PySpyProfiler (frequency: 11hz, duration: 60s)
[06:40:00] Initialized RbSpyProfiler (frequency: 11hz, duration: 60s)
[06:40:00] Could not find a Docker daemon or CRI-compatible daemon, profiling data will not include the container names. If you do have a containers runtime and it's not supported, please open a new issue here: https://github.com/Granulate/gprofiler/issues/new
[06:40:00] gProfiler initialized and ready to start profiling
[06:40:00] Failed to enable proc_events listener for exited Java processes
Traceback (most recent call last):
File "granulate_utils/linux/proc_events.py", line 211, in start
PermissionError: [Errno 1] Operation not permitted
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "granulate_utils/linux/ns.py", line 225, in _switch_and_run
File "granulate_utils/linux/proc_events.py", line 249, in _start_listener
File "granulate_utils/linux/proc_events.py", line 216, in start
PermissionError: This process doesn't have permissions to bind/connect to the process events connector
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "gprofiler/profilers/java.py", line 848, in start
File "granulate_utils/linux/proc_events.py", line 261, in wrapper
File "granulate_utils/linux/ns.py", line 237, in run_in_ns
Exception: run_in_ns execution failed
[06:40:00] Successfully uploaded profiling data to the server
[06:40:00] Stopping ...
Then I've tested with "--disable-pidns-check --perf-mode none --no-java"
[06:40:15] Initialized PythonEbpfProfiler (frequency: 11hz, duration: 60s)
[06:40:15] Python eBPF profiler initialization failed
[06:40:15] Initialized PySpyProfiler (frequency: 11hz, duration: 60s)
[06:40:15] Initialized RbSpyProfiler (frequency: 11hz, duration: 60s)
[06:40:15] Could not find a Docker daemon or CRI-compatible daemon, profiling data will not include the container names. If you do have a containers runtime and it's not supported, please open a new issue here: https://github.com/Granulate/gprofiler/issues/new
[06:40:15] gProfiler initialized and ready to start profiling
[06:40:15] Successfully uploaded profiling data to the server
As you may see no errors. But at the same time still zero samples as well. I was waiting for hour or so.
Hope it helps you to debug it in the future. I highlight that we do have Node.js process in same Docker container and this is what we wished to profile initially.
Let me know if I may help you further while I haven't gProfiler account anymore ;(
I'm sorry to hear that @akuzkohqs :(
About the /dev/kmsg
& proc_events
error - these are again false positives, sorry for the confusion (and I will make sure it is addressed) - these are two optional features and the lack of them does not prevent gProfiler from, well, doing its profiling. I will make sure to amend the log messages such that they clarify that profiling is not harmed even if they are present.
Judging by your output logs, I think that my explanation here:
So the issue that remains is that you're seeing "Successfully uploaded profiling data" but not seeing any data on the server. Just making sure - if running on Fargate with --perf-mode=none, then other profilers for e.g Java/Python must be active, and actually identify processes (otherwise no profiling data is collected, because perf is responsible for collecting data from the entire system).
is what happens. There are no profilers actually running (because perf
is disabled, and the different runtime profilers do not identify target processes). So gProfiler really uploads empty samples. Which is yet another error condition that we should at least log on, noted this as well.
If it's NodeJS that you want to profile over Fargate - that is sadly not supported yet (as Node is profiled with perf
, and perf
is not available on Fargate)
You can actually continue helping me debug even with the account gone, if you'd like :smiley_cat:
-o /tmp/output
instead of -u
) and inspect the output .col
file - is it empty or does it contain some samples?--no-perf
) and to enable Node perf --nodejs-mode=perf
, and to enable NodeJS profiling flags on your app: https://github.com/Granulate/gprofiler#nodejs-profiling-options.
We have installed gProfiler in Docker containers in both, local (Docker engine, run in EC2 container with unrestricted outgoing connectivity) and remote (AWS Fargate with SYS_PTRACE enabled).
We saw no single metric uploaded in gProfiler dashboard yet.
Docker container base:
node:13.7.0
Additional commands:
Startup made from Docker entrypoint:
We also tried simplified command without any differences, same errors, etc.:
The problem is that while service listed in gProfiler there are massive errors like this:
We have tested network connectivity and we are able to open profiler.granulate.io using curl and other tools from inside the Docker container(s). Certificate is valid as well.
We may provide additional details if requested.
Thank you for your attention.