Closed iankosen closed 1 year ago
The only reason for this I see it's the logical error with the storage retention. Data point precision must be 1day or less
precision – How precisely to define the age of the data in seconds. Should be a divisor for 86400 (seconds in a day).
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/graphitemergetree#patterns
The retention policy I am using is:
<clickhouse>
<pattern>
<regexp>regex_exp</regexp>
<function>sum</function>
<retention>
<age>0</age>
<precision>15</precision>
</retention>
<retention>
<age>108000</age>
<precision>30</precision>
</retention>
<retention>
<age>3456000</age>
<precision>300</precision>
</retention>
<retention>
<age>8640000</age>
<precision>1800</precision>
</retention>
<retention>
<age>34128000</age>
<precision>3600</precision>
</retention>
</pattern>
</clickhouse>
All the precision values are less than a day and are a divisor for 86400.
Then we need to investigate this place https://github.com/go-graphite/graphite-clickhouse/blob/1db4b3d740c0f3e72ed8a3449603f4a016b5dda8/render/data/query.go#L147C9-L147C9
It calculates the common step before the from
/until
is set, so let's add some trace logging there.
Also, what does the query in CH return?
SELECT * FROM system.graphite_retentions
Do you use auto
for rollup-config?
I've totally missed the initial query, sorry.
http://localhost:8081/render?format=json&target=metric_name&from=-30d&maxDataPoints=1
You get what you requested. Using maxDataPoints=100
will change it to another step.
Then we need to investigate this place https://github.com/go-graphite/graphite-clickhouse/blob/1db4b3d740c0f3e72ed8a3449603f4a016b5dda8/render/data/query.go#L147C9-L147C9
It calculates the common step before the
from
/until
is set, so let's add some trace logging there.Also, what does the query in CH return?
SELECT * FROM system.graphite_retentions
Do you use
auto
for rollup-config?
I use rollup-config = auto. I dont think the rollup config has an issue. The rStep it gets here is always correct.
I've totally missed the initial query, sorry.
http://localhost:8081/render?format=json&target=metric_name&from=-30d&maxDataPoints=1
You get what you requested. Using
maxDataPoints=100
will change it to another step.
That is correct. Calculating the step works as expected. The only issue I have is the update it does to the from and until dates when maxDataPoints is used. You can see from the first message the from and until values for the query are 4 days ahead after setFromUntil is executed. The issue is more obvious when Until - From is large (recommend at least 30days) and maxDataPoints is small e.g 1
c.From = 1694837820
c.Until = 1697429820
c.step = 2592000
# computed from and until in setFromUntil function will be:
c.from = 1695168000 # (four days after original c.From)
c.until = 1697759999 # (four days after original c.Until)
This causes the clickhouse to do an aggregation over the wrong time range.
It's aligned as expected. Totally correct result.
In [4]: 1694837820 // 2592000
Out[4]: 653 # the timestamp 1692576000 is out of the range originalFrom:originalUntil
In [5]: 2592000 * 654 # to get the ANY point after the original From
Out[5]: 1695168000
In [6]: 2592000 * 655
Out[6]: 1697760000 # that's Until+1, to not get the second point
The purpose of the maxDataPoints is to limit the points. So the step, from, and until are calculated accordingly.
If you feel it's broken, it's not.
I can continue to describe how it works and why, but the behavior itself is correct and aligned between graphite-clickhouse and carbonapi
The calculation of step/from/until
worked like this for 3 years and was reviewed by other maintainers. And it's tested by many people and companies. Changing it will break the world.
I am noticing a potential problem with the values that are used for from and until when maxdatapoints is provided. Below is the query I am running and my analysis of what is happening.
I am sending the query from carbon-api. The query is:
http://localhost:8081/render?format=json&target=metric_name&from=-30d&maxDataPoints=1
The From and Until values resolve to1694837820(Sat Sep 16 2023 04:17:00 GMT+0000)
and1697429820(Mon Oct 16 2023 04:17:00 GMT+0000)
respectively.In this section we recompute from and until so that it's divisible by step https://github.com/go-graphite/graphite-clickhouse/blob/1db4b3d740c0f3e72ed8a3449603f4a016b5dda8/render/data/query.go#L349.
After the calculation is done for the graphite query above the from and until values are updated to
1695168000(Wed Sep 20 2023 00:00:00 GMT+0000)
and1697759999(Thu Oct 19 2023 23:59:59 GMT+0000)
respectively. The value of step is2592000(one month)
.From the above we can see that the from and until values are off by about 4days. This leads to incorrect values being aggregated. The problem gets worse as the range between from and until increases.