Open mzealey opened 4 years ago
I need to reproduce that on my side to understand what's wrong. It would be helpful if you can provide a bit more information about metrics:
If I replace node 2's *
with a single point there is no difference.
There are about 10 properties under the final . If I set the final to waiting
then it correctly returns 100 for each metric. If i change the final * to waiting and one other metric it returns crazy values.
So basically we can reduce this case to:
asPercent(a.b.c.apache.main.apache_scoreboard.waiting, groupByNodes(a.b.c.apache.main.apache_scoreboard.{waiting,open}, "sum", 2))
where open is 0-5 and waiting is 100-150
This applies over all time ranges (30min - 7 days)
retention for these files 10s:6h,5m:30d,30m:1y,60m:3y
Hey, any idea why this is happening? It's really annoying to have to revert to graphite for some queries even though I can use CarbonAPI for 95% of them...
@mzealey : btw, it's a bit buried down in documentation, but carbonapi has ability to forward functions call to real graphite-web, see https://github.com/go-graphite/carbonapi/blob/main/cmd/carbonapi/graphiteWeb.example.yaml It's still not really convenient, but maybe help with your migration.
@mzealey the problem is that I can't reproduce this behavior at all. For me graphite-web and carbonapi returns same results.
graphite-web:
$ wget -q -O- 'http://localhost:8080/render/?target=asPercent(a.waiting, groupByNodes(a.{open,waiting}, "sum", 0))&format=json'; echo
[{"target": "asPercent(a.waiting,a)", "tags": {"name": "asPercent(a.waiting,a)"}, "datapoints": [[100.0, 1], [99.09909909909909, 2], [98.21428571428571, 3], [98.21428571428571, 4], [98.0392156862745, 5]]}]
carbonapi:
$ wget -q -O- 'http://localhost:8081/render/?target=asPercent(a.waiting, groupByNodes(a.{open,waiting}, "sum", 0))&format=json'; echo
[{"target":"asPercent(a.waiting,groupByNodes(a.{open,waiting}, \"sum\", 0))","datapoints":[[100,1],[99.09909909909909,2],[98.21428571428571,3],[98.21428571428571,4],[98.0392156862745,5]],"tags":{"name":"a.waiting"}}]
Test data I'm using:
$ wget -q -O- 'http://localhost:8080/render/?target=a.{open,waiting}&format=json'; echo
[{"target": "a.open", "tags": {"name": "a.open"}, "datapoints": [[0.0, 1], [1.0, 2], [2.0, 3], [2.0, 4], [3.0, 5]]}, {"target": "a.waiting", "tags": {"name": "a.waiting"}, "datapoints": [[100.0, 1], [110.0, 2], [110.0, 3], [110.0, 4], [150.0, 5]]}]
If that would be helpful, I'm trying to run a fake backend (see cmd/mockbackend
) with following config:
$ cat asPercent.yaml
listeners:
- address: ":9070"
expressions:
"a.open":
pathExpression: "a.open"
data:
- metricName: "a.open"
values: [0,1,2,2,3]
"a.waiting":
pathExpression: "a.waiting"
data:
- metricName: "a.waiting"
values: [100,110,110,110,150]
"a.*":
pathExpression: "a.*"
data:
- metricName: "a.waiting"
values: [100,110,110,110,150]
- metricName: "a.open"
values: [0,1,2,2,3]
"a.{open,waiting}":
pathExpression: "a.{open,waiting}"
data:
- metricName: "a.waiting"
values: [100,110,110,110,150]
- metricName: "a.open"
values: [0,1,2,2,3]
it can answer carbonapi_v2_pb and pickle to graphite-web, so I'm pointing both of them to same datasource and doing following request /render/?target=a.{open,waiting}&format=json
Could you please verify if groupByNodes actually returns same results for your query?
OK standard graphite backend (from the docker image, using carbon-go to pull data in but that shouldnt affect anything). Graphite's graph (last 7 days):
CarbonAPI's graph:
Query for both (although everything to the first | should be enough to show the differences):
asPercent(a.b.xxx.apache.main.apache_scoreboard.waiting, groupByNodes(a.b.xxx.apache.main.apache_scoreboard.*, "sum", 2)) | scale(-1) | offset(100) | aliasByNode(2) | highestAverage(20)
Interestingly, i am switching backend to clickhouse-graphite and the carbonapi integration there is returning the correct graphs. Switching graphite -> carbon to pb2 doesnt change anything though.
Here is tgz with the wsp data sources used for this graph test.tar.gz
Also, not sure if it is the same or a different issue but in another graph when using divideSeriesLists the graphs look significantly different between graphite & carbonapi, however when I select just a handful of metrics they look correct. Happy to raise a different ticket for that but it's a bit trickier to reproduce
So to clarify: you are currently using go-carbon as a backend and there you get wrong data, but once you switch to carbon-clickhouse it is correct?
In that case, could you please answer following questions:
I'm using docker image graphiteapp/graphite-statsd:master
with GOCARBON=1
. I didn't change any graphite-web config so believe it is just accessing the whisper files directly for querying.
Wrt (3) even if I only do for last 3 hours (per schema config first retention period change is at 6h) it still shows wrong data.
Go-carbon config:
[whisper]
data-dir = "/opt/graphite/storage/whisper"
schemas-file = "/opt/graphite/conf/storage-schemas.conf"
aggregation-file = "/opt/graphite/conf/storage-aggregation.conf"
workers = 12
max-updates-per-second = 0
max-creates-per-second = 0
hard-max-creates-per-second = false
sparse-create = false
flock = true
enabled = true
hash-filenames = true
compressed = false
remove-empty-file = true
[cache]
max-size = 100000000
write-strategy = "noop"
[udp]
listen = ":2003"
enabled = true
buffer-size = 0
[tcp]
listen = ":2003"
enabled = true
buffer-size = 0
[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = true
buffer-size = 0
[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"
Default agg:
[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average
...
[carbonserver]
listen = "0.0.0.0:8000"
enabled = true
buckets = 10
metrics-as-counters = true
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = false
query-cache-size-mb = 0
find-cache-enabled = true
trigram-index = true
scan-frequency = "5m0s"
trie-index = false
max-globs = 100
fail-on-max-globs = false
max-metrics-globbed = 30000
max-metrics-rendered = 1000
graphite-web-10-strict-mode = true
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]
The matching storage schema which those whisper files should have been created with is:
[default]
pattern = .*
retentions = 10s:6h,5m:30d,30m:1y,60m:3y
Would graphite-web's behavior change if you point to carbonserver, instead of reading wsp files directly?
According to the docs for the image, you can do that by setting GRAPHITE_CLUSTER_SERVERS="127.0.0.1:8000"
(that's mentioned in https://hub.docker.com/r/graphiteapp/graphite-statsd/ in "Experimental Features").
And also could you please share carbonapi's config as well?
Also, carbonlink should be disabled in such test with GRAPHITE_CARBONLINK_HOSTS=""
And by any chance, is this a single docker container or you have multiple go-carbon docker containers in carbonapi?
I've tried to reproduce your issue with files you've provided, but also no luck:
I've used go-carbon from the current master.
So if I can't reproduce the issue I can't fix it. But overall it makes me think that it could be related either to dockerimage or to software versions those docker images are using and not to the carbonapi.
However I found a small issue with how auto
worked and that in some cases it could've caused carboapi to fail to start.
I have also noticed some weird behaviour of asPercent().
In my case I have 2 metrics:
I would like to find the % of the valid values from the total. So what I am doing is first sumSeries()
then asPercent()
I use grafana, so we have 3 series
sumSeries(data.*.valid)
sumSeries(data.*.*valid)
asPercent(#A, #B)
I hide #A and #B and I get value of 400-500%, which surely is impossible. If I unhide one of the main series (#A or #B) I see the correct value.
I would like to try to reproduce this with mock data. @Civil can you please give a guide on how to use the fake backend? Then I will try to provide you with an example of the case I have.
Example for the test: https://github.com/go-graphite/carbonapi/blob/main/cmd/mockbackend/testcases/i484/i484.yaml
Structure:
version: "v1"
- config version in case I'll want to change something in future.
Query part of the test
test
- main section that describes how to perform the test
test:
apps:
- name: "carbonapi"
binary: "./carbonapi"
args:
- "-config"
- "./cmd/mockbackend/testcases/i484/carbonapi.yaml"
What to run before starting the test. This example uses it's own test config, however most of the tests I hope to have should use "./cmd/mockbackend/carbonapi_singlebackend.yaml"
as a config, if that's possible.
queries:
- endpoint: "http://127.0.0.1:8081"
delay: 1
type: "GET"
URL: "/render/?target=a.open&format=json"
queries define what will be sent to carbonapi (endpoint
says where to look for it). URL is just test field that contains url-decoded version of URL (it'll be enceded anyway).
delay
is a delay in seconds after previous query was finished (or since beginning of the test, in case it's first query).
For JSON
Theoretically it could have more than 1 query and to different endpoint, but I haven't tested that yet.
expectedResponse:
httpCode: 200
contentType: "application/json"
expectedResults:
- metrics:
- target: "a.open"
datapoints: [[0,1],[1,2],[2,3],[2,4],[3,5]]
Have all the characteristics of the response, content type, http code and metric itself.
target
is how metric will be named
datapoints
- actual data that will be returned. Format is value, timestamp
.
Currently I have no support for checking tag values.
For graphs
If your case related to png/svg rendering the only way to verify result I came up with is to check sha256 checksum:
expectedResponse:
httpCode: 200
contentType: "image/svg+xml"
expectedResults:
- sha256:
- "6d9b18d1fe7264cc0ceb1aa319bf735d346f264bae058e0918d1e41437834aa7" # sha256(nodata svg) on Gentoo stable
- "33d0b579778e2e0bfdb7cf85cbddafe08f5f97b720e1e717d046262ded23cdf2" # sha256(nodata svg) on Ubuntu Xenial (travis-ci)
Unfortunately it heavily depends on fontconfig and sha256 might be different on different machines so for PR that contains test I would ask to provide example png of expected result with short description what is currently wrong as those results will likely be different on my test system.
Example of test that checks svg image: https://github.com/go-graphite/carbonapi/blob/main/cmd/mockbackend/testcases/i503/i503.yaml
I have some plans to ignore some fields inside SVG, but I haven't implemented that yet.
Data
listeners:
- address: ":9070"
expressions:
"a.open":
pathExpression: "a.open"
data:
- metricName: "a.open"
values: [0,1,2,2,3]
this defines what mockbackend will be able to return.
What is important here:
In epxression you need to list all possible queries that will be made towards backend. "a.open"
in this case is what will be specified in target
, pathExpression: "a.open"
is pathExpression
field in response (most of the time should match with what was passed in target, so likely I would remove that field in future), data
is what actual list of metrics will be returned.
For the metrics format:
metricName
- name of the metric in reply
values
- values. Timestamp will be automatically calculated and by default will start from 1. For NaN
you should use yaml's way to specify it which is .NaN
.
startTime
- override timestamp of first value.
step
- override step (otherwise will be 1
).
Another example: https://github.com/go-graphite/carbonapi/blob/main/cmd/mockbackend/testcases/pr500/pr500.yaml
How to run tests
There are several ways to do that:
e2e_test.sh
- just will run all of themmake mockbackend
will compile mockbackend
. You can run it with mockbackend -test -config./cmd/mockbackend/testcases/i487/i487.yaml
. If you want to get logs from carbonapi, you can start it manually and run mockbackend -test -noapp -config ...
. If you omit -test
flag, mockbackend
will only reply to requests.Current limitations
carbonapi_v2_pb
and pickle
(you can run graphite-web against it). carbonapi_v3_pb
is implemented but I haven't tried a lot of queries.I have also noticed some weird behaviour of asPercent().
In my case I have 2 metrics:
* data.*.valid * data.*.invalid
I would like to find the % of the valid values from the total. So what I am doing is first
sumSeries()
thenasPercent()
I use grafana, so we have 3 series* #A: `sumSeries(data.*.valid)` * #B: `sumSeries(data.*.*valid)` * #C: `asPercent(#A, #B)`
I hide #A and #B and I get value of 400-500%, which surely is impossible. If I unhide one of the main series (#A or #B) I see the correct value.
I would like to try to reproduce this with mock data. @Civil can you please give a guide on how to use the fake backend? Then I will try to provide you with an example of the case I have.
I mentioned in #526 I am seeing the same thing. Carbonapi reporting 3-4 times the value as graphite-web produces for the same queries.
This query works fine in graphite-web with go-carbon backend but not in carbonapi (0.13 from rpm):
It's producing many values > 100 even though waiting is all 0-150 range and the result of groupByNodes plotted by itself has all points at 150.
I first thought it was to do with ordering not being preserved but it doesn't appear to be that. Any ideas?