What's system load on a server? What's the memory consumption?
The top doesn't seem to indicate any kind of throttling. I believe the userspace CPU load is somewhat higher during the query but still very low.
Mem - free -h output:
me@graphite-dev:~$ free -h
total used free shared buff/cache available
Mem: 30Gi 3.1Gi 628Mi 1.0Mi 27Gi 27Gi
Swap: 0B 0B 0B
This is our dev host.
If possible, please provide profiler's output (https://blog.golang.org/pprof for instruction how to do that, debug handlers can be enabled in carbonapi's config on a separate port). Golang allows different types of profiles. It's always better to give CPU Profile, Memory profile and Lock profile, but that depends on what's the symptoms.
2 CPU profiles which are captured during the request are attached:
Debug logs could be helpful as well
Setting log-level to debug has immense implication on performance so for now I've set it to "info" level. The only 2 relevant log entries for this request (grepped a\.\*b\*\.c\.d\.\*) are:
"level":"ERROR","timestamp":"2024-02-18T13:51:21.454Z","logger":"access","message":"request failed","data":{"handler":"render","carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","url":"/render","peer_ip":"10.42.136.247","host":"{our graphite dev host}:4040","format":"json","use_cache":true,"targets":["a.*b*.c.d.*"],"cache_timeout":180,"runtime":59.89556336,"http_code":503,"reason":"timeout while fetching Response","from":1708263920,"until":1708264222,"from_raw":"1708263920","until_raw":"1708264222","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Grafana-Org-Id":"1","X-Panel-Id":"1"}}}
and some 10's MB's long entry which starts like this:
{"level":"WARN","timestamp":"2024-02-18T13:52:40.013Z","logger":"zipper","message":"errors occurred while getting results","type":"protoV2Group","name":"http://127.0.0.1:8080","type":"fetch","request":"&MultiFetchRequest{Metrics:[]FetchRequest{FetchRequest{Name:
{{ removed for brevity }}
,"carbonapi_uuid":"d1118526-3647-4c52-99ea-49d19b549f74","errors":"max tries exceeded","errorsVerbose":"max tries exceeded\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:25\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: error while fetching Response\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:34\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"
Query that causes problems
GF queries metrics. It's render query. Query target is expressed by a.*b*.c.d.*.
According to go-carbon logs, there are ~65.5K files (i.e. metrics) returned.
resolution - I think it's irrelevant.
datapoints amount - Also irrelevant to my mind?
Additional context
The carbonapi log reports status code of 503. The nginx reverse proxy indicates code 499 which is according to the internet means a client has halted the connection. Anyway, 1 minute feels too long anyway.
Problem description Query
a.*b*.c.d.*
times out after a minute.carbonapi's version v0.16.0
Does this happened before irrelevant (I didn't have an earlier version)
carbonapi's config Will provide if relevant.
backend software and config Go-Carbon 0.17.3. Config will be provided if needed.
carbonapi performance metrics
Please enable
expvars
in the config and provide it's output I believe this is the required expvars config:IDK what output this question refers to.
What's system load on a server? What's the memory consumption? The
top
doesn't seem to indicate any kind of throttling. I believe the userspace CPU load is somewhat higher during the query but still very low. Mem -free -h
output:This is our dev host.
If possible, please provide profiler's output (https://blog.golang.org/pprof for instruction how to do that,
debug
handlers can be enabled in carbonapi's config on a separate port). Golang allows different types of profiles. It's always better to give CPU Profile, Memory profile and Lock profile, but that depends on what's the symptoms. 2 CPU profiles which are captured during the request are attached:Debug logs could be helpful as well Setting log-level to debug has immense implication on performance so for now I've set it to "info" level. The only 2 relevant log entries for this request (grepped
a\.\*b\*\.c\.d\.\*
) are:and some 10's MB's long entry which starts like this:
Query that causes problems
a.*b*.c.d.*
.go-carbon
logs, there are ~65.5K files (i.e. metrics) returned.Additional context The
carbonapi
log reports status code of 503. Thenginx
reverse proxy indicates code 499 which is according to the internet means a client has halted the connection. Anyway, 1 minute feels too long anyway.