go-graphite / carbonzipper

proxy to transparently merge graphite carbon backends
Other
104 stars 29 forks source link

unable to merge ovalues #30

Closed wstiern closed 7 years ago

wstiern commented 7 years ago

Seeing this message in carbonzipper logs:

request: /render/?format=protobuf&from=1488213577&target=big.long.metric.name&until=1488299977: unable to merge ovalues: len(values)=8640 but len(ovalues)=1440

Not sure what to do here, or what info you need to diagnose. Any help would be appreciated.

nnuss commented 7 years ago

Your carbonzipper is receiving (at least) 2 series for big.long.metric.name in the from=...&until=... window.

When A has a missing value it attempts to fill it in from B. Since these two series have different sizes carbonzipper can't trivially find the corresponding point in B and so warns in the log and performs no merging; you get just A.

If you actually have 2 copies of big.long.metric.name that are at different retentions it would be best to choose one and make them the same.

But it seems likely that your requests are right on a rollup boundary and you are seeing a race where one store is returning the high-resolution (10s:1d) version and another the low-resolution (60s:7d perhaps). Extending your query just beyond the boundary (from=-1d1s&until=now) should force the stores to choose a consistent retention band. If that helps then you can work to reduce the race by improving write throughput/latency and time sync.

wstiern commented 7 years ago

Thanks for your response @nnuss , I really appreciate it. :)

This is actually sumSeries(big.long.*.wildcard.series.name), and as we're using Grafana for visualization the timespans of the queries often do fall on the aggregation boundaries. Does that change things at all?

wstiern commented 7 years ago

Also, if I'm not able to find any discernible different in the storage schema retentions, what should I look for?

wstiern commented 7 years ago

Another interesting piece of information is that if I point carbonapi directly at carbonserver, I only get data beyond the last 24 hours.

wstiern commented 7 years ago

It's worth noting that none of this happens when I source data via Graphite-Web.

nnuss commented 7 years ago

sumSeries(some.*.glob.{or,brace}) doesn't change this situation.

If the storage-schemas.conf are consistent you can check carbonzipper.example.com/info/?target=some.specific.series shows the hosts' files have consistent definitions. We would expect they do - but if one happened to be written when a prior config was in place and not cleaned up they could still vary.

When you request now-7d from carbonapi<->carbonserver the result does not have points 24 hours old or newer?

wstiern commented 7 years ago

The data displayed in the /info path brought into question an architectural decision I made: I had multiple carbonserver back-ends configured in carbonzipper.json, me thinking I was being super clever by running many processes in parallel. After reducing the back-ends to one, the errors about merging ovalues have disappeared from the carbonzipper logs.

It also looks like I made an observational error -- data served from the carbonserver/carbonzipper/carbonapi stack is truncated at a very specific date/time, yesterday at ~4:28pm local time, whereas Graphite-Web seems to disagree and render everything received since then.

Check this out: http://imgur.com/a/EnuJg

This appears to happen for any series that I query.

Civil commented 7 years ago

Can you post what /info says in this case?

In our setup we also have all carbonserver (actually it's go-carbon now) configured as backends for carbonzipper and this was actually a purpose for creating zipper.

Also is graphite-web also talks to carbonserver or carbonzipper?

And one more thing - you have the files with the same retention on all the storages, right? Or different backends have different retentions?

wstiern commented 7 years ago

@Civil

Thanks for your reply. :)

{"http://0.0.0.0:5004":{"name":"big.long.series.name","aggregationMethod":"Average","maxRetention":31536000,"xFilesFactor":0,"retentions":[{"secondsPerPoint":10,"numberOfPoints":8640},{"secondsPerPoint":60,"numberOfPoints":20160},{"secondsPerPoint":300,"numberOfPoints":8640},{"secondsPerPoint":1800,"numberOfPoints":8640},{"secondsPerPoint":3600,"numberOfPoints":8760}]}}

The same block was spammed in /info for each configured back-end when I had multiple carbonserver back-ends in play.

We're also using go-carbon (technically this is a POC of go-carbon vs carbon-cache.py). The Graphite-Web instance talks to go-carbon via the Carbonlink listeners at the moment.

We're only using one carbonserver back-end at this point. I have quintuple-confirmed in a couple different ways now (/info, whisper-info.py) that all the DBs I'm working with here have the same storage schema retentions configured. :)

wstiern commented 7 years ago

Welp, looks like I was actually querying graphite-web powered by carbon-cache.py on a different server in that Imgur link (same Whisper DBs are on both servers though, and I'm using carbon-c-relay to write data to both machines). When I query graphite-web powered by go-carbon it does the same thing with the truncation of the data at ~4:28pm on 2/27. Seems to isolate things to go-carbon or the carbon-c-relay ingress at this point. Closing this issue since it no longer seems to be related to carbonapi stack.

Thanks for everyone's input, the /info tip was great.