Closed blind-oracle closed 5 years ago
Can you show what your config looks like?
Usually it happens when you have a 0 of max requests per backend, as in bookingcom's fork they threat it as "unlimited" but here it's treated as "0" as of now (which is probably should be fixed, as having 0 requests per backend makes no sense)
Sure.
listen: "0.0.0.0:8888"
concurency: 20
cache:
type: "mem"
size_mb: 0
defaultTimeoutSec: 60
cpus: 0
tz: ""
maxBatchSize: 0
graphite:
host: ""
interval: "60s"
prefix: "carbon.api"
pattern: "{prefix}.{fqdn}"
idleConnections: 1000
pidFile: ""
upstreams:
buckets: 10
timeouts:
find: "10s"
render: "50s"
connect: "200ms"
concurrencyLimitPerServer: 1000
keepAliveInterval: "30s"
maxIdleConnsPerHost: 1000
backendsv2:
backends:
-
groupName: "ch"
protocol: "carbonapi_v2_pb"
lbMethod: "rr"
maxTries: 3
maxBatchSize: 0
keepAliveInterval: "30s"
concurrencyLimit: 1000
maxIdleConnsPerHost: 1000
servers:
- "http://127.0.0.1:9090"
expireDelaySec: 10
logger:
- logger: ""
file: "stderr"
level: "debug"
encoding: "console"
encodingTime: "iso8601"
encodingDuration: "seconds"
- logger: ""
file: "carbonapi.log"
level: "info"
encoding: "json"
Usually it happens when you have a 0 of max requests per backend
What parameter is this?
And what's in the logs when carbonapi start?
And the config from http://localhost:8888/debug/vars after you do couple of requests?
About the option, I'll check when I got home.
And what's in the logs when carbonapi start?
2019-06-22T20:11:48.613+0200 INFO main will parse config as yaml {"config_file": "/opt/carbonapi.yml"}
2019-06-22T20:11:48.617+0200 ERROR functionInit failed to read config file {"function": "aliasByPostgres", "error": "Config File \"config\" Not Found in \"[]\""}
2019-06-22T20:11:48.617+0200 WARN functionInit aliasByPostgres config found but aliasByPostgres is disabled {"function": "aliasByPostgres"}
2019-06-22T20:11:48.617+0200 WARN functionInit no config file specified {"function": "graphiteWeb", "message": "this function requrires config file to work properly"}
2019-06-22T20:11:48.617+0200 INFO main starting carbonapi {"build_version": "0.11.0-490-g5a4f", "config": "{\"ExtrapolateExperiment\":false,\"Logger\":[{\"logger\":\"\",\"file\":\"stderr\",\"level\":\"debug\",\"encoding\":\"console\",\"encoding-time\":\"iso8601\",\"encoding-duration\":\"seconds\",\"sample-tick\":\"\",\"sample-initial\":0,\"sample-thereafter\":0},{\"logger\":\"\",\"file\":\"carbonapi.log\",\"level\":\"info\",\"encoding\":\"json\",\"encoding-time\":\"\",\"encoding-duration\":\"\",\"sample-tick\":\"\",\"sample-initial\":0,\"sample-thereafter\":0}],\"Listen\":\"0.0.0.0:8888\",\"Buckets\":10,\"Concurency\":20,\"Cache\":{\"Type\":\"mem\",\"Size\":0,\"MemcachedServers\":[],\"DefaultTimeoutSec\":60},\"Cpus\":0,\"TimezoneString\":\"\",\"UnicodeRangeTables\":null,\"Graphite\":{\"Pattern\":\"{prefix}.{fqdn}\",\"Host\":\"\",\"Interval\":60000000000,\"Prefix\":\"carbon.api\"},\"IdleConnections\":1000,\"PidFile\":\"\",\"SendGlobsAsIs\":false,\"AlwaysSendGlobsAsIs\":false,\"MaxBatchSize\":0,\"Zipper\":\"\",\"Upstreams\":{\"ConcurrencyLimitPerServer\":1000,\"MaxIdleConnsPerHost\":1000,\"Backends\":null,\"BackendsV2\":{\"Backends\":[{\"GroupName\":\"ch\",\"Protocol\":\"carbonapi_v2_pb\",\"LBMethod\":\"all\",\"Servers\":[\"http://127.0.0.1:9090\"],\"Timeouts\":null,\"ConcurrencyLimit\":1000,\"KeepAliveInterval\":30000000000,\"MaxIdleConnsPerHost\":1000,\"MaxTries\":3,\"MaxBatchSize\":0}],\"MaxIdleConnsPerHost\":0,\"ConcurrencyLimitPerServer\":0,\"Timeouts\":{\"Find\":0,\"Render\":0,\"Connect\":0},\"KeepAliveInterval\":0,\"MaxTries\":0,\"MaxBatchSize\":0},\"MaxBatchSize\":0,\"MaxTries\":0,\"CarbonSearch\":{\"Backend\":\"\",\"Prefix\":\"virt.v1.*\"},\"CarbonSearchV2\":{\"Backends\":null,\"MaxIdleConnsPerHost\":0,\"ConcurrencyLimitPerServer\":0,\"Timeouts\":{\"Find\":0,\"Render\":0,\"Connect\":0},\"KeepAliveInterval\":0,\"MaxTries\":0,\"MaxBatchSize\":0,\"Prefix\":\"\"},\"ExpireDelaySec\":0,\"InternalRoutingCache\":600000000000,\"Timeouts\":{\"Find\":10000000000,\"Render\":50000000000,\"Connect\":200000000},\"KeepAliveInterval\":30000000000},\"ExpireDelaySec\":10,\"GraphiteWeb09Compatibility\":false,\"IgnoreClientTimeout\":false,\"DefaultColors\":null,\"GraphTemplates\":\"\",\"FunctionsConfigs\":{}}"}
2019-06-22T20:11:48.618+0200 DEBUG zipper initializing zipper
2019-06-22T20:11:48.618+0200 DEBUG zipper creating lb group {"name": "ch", "servers": ["http://127.0.0.1:9090"], "type": "all"}
2019-06-22T20:11:48.618+0200 DEBUG zipper created broadcast group {"type": "broadcastGroup", "groupName": "ch", "group_name": "ch", "clients": ["http://127.0.0.1:9090"]}
2019-06-22T20:11:48.618+0200 DEBUG zipper created broadcast group {"type": "broadcastGroup", "groupName": "root", "group_name": "root", "clients": ["ch"]}
2019-06-22T20:11:48.618+0200 DEBUG zipper zipper config {"config": {"ConcurrencyLimitPerServer":1000,"MaxIdleConnsPerHost":1000,"Backends":null,"BackendsV2":{"Backends":[{"GroupName":"ch","Protocol":"carbonapi_v2_pb","LBMethod":"all","Servers":["http://127.0.0.1:9090"],"Timeouts":{"Find":10000000000,"Render":50000000000,"Connect":200000000},"ConcurrencyLimit":1000,"KeepAliveInterval":30000000000,"MaxIdleConnsPerHost":1000,"MaxTries":3,"MaxBatchSize":0}],"MaxIdleConnsPerHost":0,"ConcurrencyLimitPerServer":0,"Timeouts":{"Find":10000000000,"Render":50000000000,"Connect":200000000},"KeepAliveInterval":0,"MaxTries":0,"MaxBatchSize":0},"MaxBatchSize":0,"MaxTries":0,"CarbonSearch":{"Backend":"","Prefix":"virt.v1.*"},"CarbonSearchV2":{"Backends":null,"MaxIdleConnsPerHost":0,"ConcurrencyLimitPerServer":0,"Timeouts":{"Find":0,"Render":0,"Connect":0},"KeepAliveInterval":0,"MaxTries":0,"MaxBatchSize":0,"Prefix":""},"ExpireDelaySec":0,"InternalRoutingCache":600000000000,"Timeouts":{"Find":10000000000,"Render":50000000000,"Connect":200000000},"KeepAliveInterval":30000000000}}
2019-06-22T20:11:48.618+0200 DEBUG zipper doing request {"type": "protoV2Group", "name": "http://127.0.0.1:9090", "function": "prober", "request": ["*"]}
2019-06-22T20:11:48.618+0200 DEBUG zipper picked server {"type": "protoV2Group", "name": "http://127.0.0.1:9090", "action": "query", "server": "http://127.0.0.1:9090"}
2019-06-22T20:11:48.618+0200 DEBUG zipper trying to get slot {"type": "protoV2Group", "name": "http://127.0.0.1:9090", "action": "query", "server": "http://127.0.0.1:9090", "name": "http://127.0.0.1:9090", "uri": "http://127.0.0.1:9090/metrics/find/?format=protobuf&query=%2A"}
2019-06-22T20:11:48.618+0200 DEBUG zipper got slot {"type": "protoV2Group", "name": "http://127.0.0.1:9090", "action": "query", "server": "http://127.0.0.1:9090", "name": "http://127.0.0.1:9090", "uri": "http://127.0.0.1:9090/metrics/find/?format=protobuf&query=%2A"}
2019-06-22T20:11:48.627+0200 DEBUG zipper will return data {"type": "protoV2Group", "name": "http://127.0.0.1:9090", "function": "prober", "tlds": ["SC", "RIEMANN", "carbon"]}
And the config from http://localhost:8888/debug/vars after you do couple of requests?
https://gist.github.com/blind-oracle/07f08e73f2ef28279947c2e64429b97d
Hm...
I can't reproduce your issue with my test setup... (I've tried your config as well)
However I've found couple of other smaller issues (e.x. maxBatchSize was hardcoded to "100" since https://github.com/go-graphite/carbonapi/commit/06f88803556e49ce22b85bfebadee0510385eca9). So I've fixed them in current master.
Could you try to build again from master and also enable debug logging (level="debug" in logging section)?
Also can you tell me to how many metrics your query actually resolves?
The option I was talking about was "concurrencyLimit", but in your case it's set to 1000, so it should be fine.
@Civil
Could you try to build again from master
Checked out latest master (0.11.0-504-g98cc
). BTW I'm building with nocairo
if that's relevant.
The result is the same.
enable debug logging (level="debug" in logging section)
It's already debug in your example yaml, so I left it as is.
Also can you tell me to how many metrics your query actually resolves?
Now I'm testing with just one metric with 1440 values over a requested period. So it's not a lot.
Funny thing that it does HTTP call to graphite-clickhouse
only on start:
GET /metrics/find/?format=protobuf&query=%2A HTTP/1.1
Host: 127.0.0.1:9090
User-Agent: Go-http-client/1.1
X-Ctx-Carbonapi-Uuid: 47c16f97-11f4-4111-ab89-6b87fe806d46
Accept-Encoding: gzip
HTTP/1.1 200 OK
Date: Sun, 23 Jun 2019 09:06:59 GMT
Content-Length: 30
Content-Type: application/octet-stream
.*..
.SC.
.RIEMANN..
.carbon
Then when I request the actual metrics it hangs on waiting for slot
and then times out. But at start it goes further to got slot
.
Also at start it works with "type": "protoV2Group", "name": "http://127.0.0.1:9090"
, but with actual request it goes to "type": "broadcastGroup", "groupName": "root"
Ah ok I got it, for some reason it was now set to broadcast instead of round-robin as I specified in the config above. When I changed to rr
it started working 😕
But why? My guess is that with 1 server rr
should be the same as all
, no?
It shouldn't matter. In fact my default clickhouse test setup is using "broadcast" mode and current test setup for Prometheus is using "rr".
So there is something else that I'm missing here and therefore still cannot reproduce.
I've currently tagged current master as "0.12.0-rc.0" and it also include last commit where I've tried to unify all the fetching logic. I'll have a closer look on how the rate limiter is working at this moment, as if there is a deadlock somewhere it's likely be there.
Я сейчас специально проверил на всякий случай. В том конфиге который я выше постил если изменить rr
на all
- работать перестает. Меняешь обратно - продолжает работать. Больше ничего не трогал.
В любом случае, спасибо - по крайней мере сервис теперь можно использовать.
Судя по моим нескольким тестам, он лучше заменяет graphite-web
чем букинговый форк. По крайней мере часть графиков в их форке у меня выводилать не совсем так как надо.
Если нужно будет помочь с тестированием - дай знать. Руки дойдут - может тоже поищу проблему...
ЗЫ: О, Цюрих, привет сосед :)
Ок, посмотрю как смогу, может действительно все дело в all
, у меня то в конфиге broadcast
который должен был бы быть синонимом.
Про функции не удивительно, у меня их и больше поддерживается и я старался исправлять ошибки совместимости, где находил.
С тестированием - багрепорты всегда полезны. Так у меня нет продакшена где бы я мог использовать графит, то багрепорты - лучший источник ошибок.
ЗЫ: Ага, есть такое дело)
@Civil
В общем проблема понятна. При инициализации он создает две иерархические броадкаст-группы:
2019-06-24T10:33:31.467+0200 DEBUG zipper created broadcast group {"type": "broadcastGroup", "groupName": "ch", "group_name": "ch", "clients": ["http://127.0.0.1:9090"], "max_batch_size": 0}
2019-06-24T10:33:31.467+0200 DEBUG zipper created broadcast group {"type": "broadcastGroup", "groupName": "root", "group_name": "root", "clients": ["ch"], "max_batch_size": 0}
У вышестоящей root
один клиент - группа серверов (в моем случае ch
), у нижестоящей - уже сами сервера (http://127.0.0.1:9090
).
При входе в группу root
лимитер пытается получить слот по имени сервера http://127.0.0.1:9090
, хотя при создании группы лимитеру было передано только ch
.
2019-06-24T10:33:37.006+0200 DEBUG zipper waiting for slot {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["SC.PROD.{AT,NFT,OT,UFT,VT}.*.*.*[rs]*.network.nic.bond3.bw_out"], "client_name": "http://127.0.0.1:9090", "maxConns": 1000}
И он таймаутится по контексту при записи в канал:
select {
case sl.m[s] <- struct{}{}:
return nil
case <-ctx.Done():
return errors.New("timeout exceeded")
}
Вообще, по идее, на sl.m[s]
он должен тупо паниковать, так как переданного ключа s
в мапе нет, но, возможно, я что-то не знаю о Go :)
В качестве workaround я назвал группу и сервер одинаково (http://127.0.0.1:9090
) - и все заработало в броадкасте. Дальше копать пока времени нет, возможно у тебя быстрее получится.
Спасибо, посмотрю. Возможно в этом и причина.
Спасибо! А про мапы и панику это у меня Го с Питоном и его KeyError в голове смешались... Так что оно совершенно законно не паникует, а пытается писать вникуда.
Кажется я нашел проблему, теоретически должно поправится в мастере.
Проблема была в методе Children() который возвращал не своих детей.
Подтверждаю, теперь все работает. Спасибо!
As I already told @Civil in some other thread somewhere I wasn't able to make this
carbonapi
withgraphite-clickhouse
backend some months ago.I tried again today with same result. I used the
carbonapi.example.clickhouse.yaml
config as base and can't get any output.On request from Grafana it just hangs for 50s and then returns empty set:
On the
graphite-clickhouse
side I just get one query:What am I doing wrong? bookingcom fork works without any issues.
Thanks in advance.