Closed wstiern closed 7 years ago
Your carbonzipper is receiving (at least) 2 series for big.long.metric.name
in the from=...&until=...
window.
When A has a missing value it attempts to fill it in from B. Since these two series have different sizes carbonzipper can't trivially find the corresponding point in B and so warns in the log and performs no merging; you get just A.
If you actually have 2 copies of big.long.metric.name
that are at different retentions it would be best to choose one and make them the same.
But it seems likely that your requests are right on a rollup boundary and you are seeing a race where one store is returning the high-resolution (10s:1d) version and another the low-resolution (60s:7d perhaps). Extending your query just beyond the boundary (from=-1d1s&until=now) should force the stores to choose a consistent retention band. If that helps then you can work to reduce the race by improving write throughput/latency and time sync.
Thanks for your response @nnuss , I really appreciate it. :)
This is actually sumSeries(big.long.*.wildcard.series.name)
, and as we're using Grafana for visualization the timespans of the queries often do fall on the aggregation boundaries. Does that change things at all?
Also, if I'm not able to find any discernible different in the storage schema retentions, what should I look for?
Another interesting piece of information is that if I point carbonapi directly at carbonserver, I only get data beyond the last 24 hours.
It's worth noting that none of this happens when I source data via Graphite-Web.
sumSeries(some.*.glob.{or,brace})
doesn't change this situation.
If the storage-schemas.conf are consistent you can check carbonzipper.example.com/info/?target=some.specific.series
shows the hosts' files have consistent definitions. We would expect they do - but if one happened to be written when a prior config was in place and not cleaned up they could still vary.
When you request now-7d from carbonapi<->carbonserver the result does not have points 24 hours old or newer?
The data displayed in the /info
path brought into question an architectural decision I made: I had multiple carbonserver back-ends configured in carbonzipper.json, me thinking I was being super clever by running many processes in parallel. After reducing the back-ends to one, the errors about merging ovalues have disappeared from the carbonzipper logs.
It also looks like I made an observational error -- data served from the carbonserver/carbonzipper/carbonapi stack is truncated at a very specific date/time, yesterday at ~4:28pm local time, whereas Graphite-Web seems to disagree and render everything received since then.
Check this out: http://imgur.com/a/EnuJg
This appears to happen for any series that I query.
Can you post what /info says in this case?
In our setup we also have all carbonserver (actually it's go-carbon now) configured as backends for carbonzipper and this was actually a purpose for creating zipper.
Also is graphite-web also talks to carbonserver or carbonzipper?
And one more thing - you have the files with the same retention on all the storages, right? Or different backends have different retentions?
@Civil
Thanks for your reply. :)
{"http://0.0.0.0:5004":{"name":"big.long.series.name","aggregationMethod":"Average","maxRetention":31536000,"xFilesFactor":0,"retentions":[{"secondsPerPoint":10,"numberOfPoints":8640},{"secondsPerPoint":60,"numberOfPoints":20160},{"secondsPerPoint":300,"numberOfPoints":8640},{"secondsPerPoint":1800,"numberOfPoints":8640},{"secondsPerPoint":3600,"numberOfPoints":8760}]}}
The same block was spammed in /info
for each configured back-end when I had multiple carbonserver back-ends in play.
We're also using go-carbon (technically this is a POC of go-carbon vs carbon-cache.py). The Graphite-Web instance talks to go-carbon via the Carbonlink listeners at the moment.
We're only using one carbonserver back-end at this point. I have quintuple-confirmed in a couple different ways now (/info, whisper-info.py
) that all the DBs I'm working with here have the same storage schema retentions configured. :)
Welp, looks like I was actually querying graphite-web powered by carbon-cache.py on a different server in that Imgur link (same Whisper DBs are on both servers though, and I'm using carbon-c-relay to write data to both machines). When I query graphite-web powered by go-carbon it does the same thing with the truncation of the data at ~4:28pm on 2/27. Seems to isolate things to go-carbon or the carbon-c-relay ingress at this point. Closing this issue since it no longer seems to be related to carbonapi stack.
Thanks for everyone's input, the /info
tip was great.
Seeing this message in carbonzipper logs:
request: /render/?format=protobuf&from=1488213577&target=big.long.metric.name&until=1488299977: unable to merge ovalues: len(values)=8640 but len(ovalues)=1440
Not sure what to do here, or what info you need to diagnose. Any help would be appreciated.