Closed LordJeffrey closed 5 years ago
Other errors involving "replications": ERRO[0098] Get http://:8091/pools/default/buckets/presence/stats/replications%2Fa67eb4ce35e01b1573cc1e2261b1d2f2%2Fpresence%2Fpresence%2Fdocs_opt_repd: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Hi @LordJeffrey ,
I'll treat this issue ASAP.
In the mean time, I have a question: when you say that you don't get any stats, do you mean no XDCR stats or no metrics at all ?
I had time to actually test the exporter with Couchbase community 3.0.1 using vagrant, and unfortunately I did not reproduce your errors.
Here's what I did:
I installed Couchbase 3.0.1 on a vagrant ubuntu/trusty64 VM with Docker:
sudo docker run -d --name cb -p 8091-8094:8091-8094 -p 11210:11210 couchbase:community-3.0.1
I connected to the Couchbase node in the browser and used default configuration. I then created a remote cluster and initiated a replication between 2 of the examples buckets (beer-sample and default) to test XDCR metrics collection.
I downloaded the version 0.5.2 of the exporter and started it on my machine with the following configuration file:
web:
listenAddress: :9191
telemetryPath: /metrics
db:
user: admin
password: mypassword
uri: http://192.168.10.10:8091
log:
level: debug
format: text
scrape:
cluster: true
node: true
bucket: true
xdcr: true
The logs I get when requesting the exporter are as follows:
> ./couchbase_exporter
DEBU[0000] Get http://192.168.10.10:8091/pools (6.018162ms)
INFO[0000] Couchbase version: 3.0.1-1444-rel-community
INFO[0000] Community version: true
WARN[0000] Version 3.0.1-1444-rel-community may not be supported by this exporter
DEBU[0000] /Users/aabdelhak/Projets/go/src/github.com/blakelead/couchbase_exporter/metrics/cluster-default.json loaded
DEBU[0000] Cluster exporter registered
DEBU[0000] /Users/aabdelhak/Projets/go/src/github.com/blakelead/couchbase_exporter/metrics/node-default.json loaded
DEBU[0000] Node exporter registered
DEBU[0000] /Users/aabdelhak/Projets/go/src/github.com/blakelead/couchbase_exporter/metrics/bucket-default.json loaded
DEBU[0000] Bucket exporter registered
DEBU[0000] /Users/aabdelhak/Projets/go/src/github.com/blakelead/couchbase_exporter/metrics/bucketstats-default.json loaded
DEBU[0000] Bucketstats exporter registered
DEBU[0000] /Users/aabdelhak/Projets/go/src/github.com/blakelead/couchbase_exporter/metrics/xdcr-default.json loaded
DEBU[0000] XDCR exporter registered
INFO[0000] Listening at :9191
DEBU[0004] Get http://192.168.10.10:8091/pools/default/tasks (4.702954ms)
DEBU[0004] Get http://192.168.10.10:8091/nodes/self (15.828967ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default (21.458141ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets (51.181884ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets (54.048682ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_written (59.044683ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fchanges_left (61.957526ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Frate_received_from_dcp (65.587845ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_filtered (69.969096ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_failed_cr_source (73.142982ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fbandwidth_usage (76.05548ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_rep_queue (80.47983ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fwtavg_meta_latency (82.655471ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fnum_checkpoints (85.58867ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdata_replicated (91.066479ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_checked (100.958467ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fwtavg_docs_latency (103.820349ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Ftime_committing (106.791178ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_received_from_dcp (111.049308ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fdocs_opt_repd (113.51766ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Frate_replicated (115.636506ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fsize_rep_queue (118.916487ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats/replications%2F6735d4da4ef0f0758e89ea83f322f3a5%2Fbeer-sample%2Fdefault%2Fnum_failedckpts (123.003632ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/beer-sample/stats (114.714329ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/default/stats (119.81162ms)
DEBU[0004] Get http://192.168.10.10:8091/pools/default/buckets/gamesim-sample/stats (124.265292ms)
There is something that you could check: the exporter has a 10 second http timeout. Is there any lag in communication between the exporter and Couchbase cluster ?
For me, I wasn't getting any stats. This is very good that you tested this and got stats -- hope I didn't waste your time. I'm going to try to redo all my steps to make sure I have everything right and try again. Thanks so much!
Also, thanks for deleting my comments :)
I'm thankful that you are using my exporter so don't worry, you're not wasting my time :)
Don't hesitate if you have more info about your issue.
Doing some testing today. It seems it IS reporting the new stats, as well as other stats, but it looks like I'm simply/actually timing out. I get this occasionally, at random: (hostname)8091/pools/default/buckets/presence/stats: net/http: request canceled (Client.Timeout exceeded while awaiting headers) ERRO[0336] Could not unmarshal bucketstats data for bucket mwi
I get this error for various different stats at random. The most common type involves "repliations": /pools/default/buckets/presence/stats/replications%2Fa67eb4ce35e01b1573cc1e2261b1d2f2%2Fpresence%2Fpresence%2Fdocs_filtered: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I'm going to look through the documentation to see if there is a way to increase the timeout time. If there isn't, could you post a way? It could be my system is heavily used enough that I'll need more time to deliver the data.
One thing that seems to sort of work is setting this in "couchbase_exporter.go": // custom server used to set timeouts httpSrv := &http.Server{ Addr: listenAddr, ReadTimeout: 20 time.Second, WriteTimeout: 30 * time.Second, }
I say it only "sort of" works because I still get the timeout errors (after 10 seconds), but the client request I do in my browser doesn't go on forever. With the default settings, if I get the timeout alerts, the client tries again and again causing a loop of errors every 10 seconds. With these settings, it stops after 10 seconds, prints the stats it has, and yes I still get errors. Not sure how all this works, any insight appreciated.
I'll investigate on that and you are right I should parameterize the timeouts. I'll do that ASAP!
Sweet. I'm going to be out all next week, so no rush (if you were rushing, haha). Cheers.
Hi @LordJeffrey,
I can't reproduce the timeouts you have but I added 2 new parameters in version 0.6.0:
I hope this will solve your problem when you get back :)
Hello,
Thanks for the help earlier -- it can run now without crashing :) One issue though: yes, it runs, but it doesn't actually get any stats when it runs now. I get these errors every time it scrapes (below). When I go to http://:8091/pools/default/buckets/presence/stats/replications I do actually see json, so the endpoints seem to be there. Any ideas? Thanks!