Ability to set non-static fill value, based on existing data in series for aggregates

dbalagansky commented 2 years ago

Which version of Gnocchi are you using

$ gnocchi --version
gnocchi 7.0.7
$ gnocchi server version
+---------+-------+
| Field   | Value |
+---------+-------+
| version | 4.4.2 |
+---------+-------+

on OpenStack Xena release.

How to reproduce your problem

I'm trying to get percentage of CPU and memory utilization on a scale from 1 to 100 (for CPU that means, that percent of utilization doesn't depend on a number of vCPUs in VM).

For cpu I have this data in series:

$ gnocchi aggregates --resource-type instance --start 2022-07-27T12:00:00 '(/ (metric cpu rate:mean) 60000000000)' id=c1081247-90cb-471b-b1e3-3a927de2e042
+----------------------------------------------------+---------------------------+-------------+----------------------+
| name                                               | timestamp                 | granularity |                value |
+----------------------------------------------------+---------------------------+-------------+----------------------+
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:00:00+00:00 |       300.0 |               0.0696 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:05:00+00:00 |       300.0 |  0.06923333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:10:00+00:00 |       300.0 |  0.06856666666666666 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:15:00+00:00 |       300.0 |  0.06993333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:20:00+00:00 |       300.0 |               0.0723 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:25:00+00:00 |       300.0 |               0.0712 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:30:00+00:00 |       300.0 |  0.06833333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:35:00+00:00 |       300.0 |  0.06629166666666667 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:40:00+00:00 |       300.0 |  0.06883333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:45:00+00:00 |       300.0 |  0.07066666666666667 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:50:00+00:00 |       300.0 |  0.07073333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T12:55:00+00:00 |       300.0 |  0.07306666666666667 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:00:00+00:00 |       300.0 |               0.0712 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:05:00+00:00 |       300.0 |  0.07343333333333334 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:10:00+00:00 |       300.0 |               0.0745 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:15:00+00:00 |       300.0 |  0.07703333333333333 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:20:00+00:00 |       300.0 |  0.07103333333333334 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:25:00+00:00 |       300.0 |               0.0712 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:30:00+00:00 |       300.0 |  0.07246666666666667 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:35:00+00:00 |       300.0 |               0.0376 |
| c1081247-90cb-471b-b1e3-3a927de2e042/cpu/rate:mean | 2022-07-27T13:40:00+00:00 |       300.0 | 0.037333333333333336 |
+----------------------------------------------------+---------------------------+-------------+----------------------+

For vcpus I have this:

$ gnocchi aggregates --resource-type instance --start 2022-07-27T12:00:00 '(metric vcpus mean)' id=c1081247-90cb-471b-b1e3-3a927de2e042
+-------------------------------------------------+---------------------------+-------------+-------+
| name                                            | timestamp                 | granularity | value |
+-------------------------------------------------+---------------------------+-------------+-------+
| c1081247-90cb-471b-b1e3-3a927de2e042/vcpus/mean | 2022-07-27T12:00:00+00:00 |       300.0 |   4.0 |
| c1081247-90cb-471b-b1e3-3a927de2e042/vcpus/mean | 2022-07-27T13:00:00+00:00 |       300.0 |   4.0 |
+-------------------------------------------------+---------------------------+-------------+-------+

What is the result that you get

Now, when I try to aggregate those two series to get CPU utilization on a scale from 1 to 100, I get data points in the resulting set only for intersections of series where points are with the same timestamp:

$ gnocchi aggregates --resource-type instance --start 2022-07-27T12:00:00 '(/ (aggregate mean (/ (metric cpu rate:mean) (metric vcpus mean))) 60000000000))' id=c1081247-90cb-471b-b1e3-3a927de2e042
+------------+---------------------------+-------------+--------+
| name       | timestamp                 | granularity |  value |
+------------+---------------------------+-------------+--------+
| aggregated | 2022-07-27T12:00:00+00:00 |       300.0 | 0.0174 |
| aggregated | 2022-07-27T13:00:00+00:00 |       300.0 | 0.0178 |
+------------+---------------------------+-------------+--------+

This works as expected, as this should be fixed by using fill parameter, but I can only set a static value for fill, whereas I need missing values from the vcpus series to be filled with "same value as previous non-null/none/NaN" value, for this kind of aggregate to work properly.

Previously, this could be achieved by creating same percentage metrics with transform in sink, which got deprecated and consequetively removed.

What is result that you expected

I expect a way to get CPU and memory utilization in percent for all available data point in cpu and memory.usage series on a scale from 0 to 100 by using calculated, based on existing data in series, value for fill.

Some time ago there were attempts to fix this missing bit within ceilometer:

https://review.opendev.org/c/openstack/ceilometer/+/799963 for cpu_util
https://review.opendev.org/c/openstack/ceilometer/+/597054 for memory_util and the first one of these raised a valid question for why does percentage would be stored as cumulative value, so Gnocchi looks like proper place to add this functionality. missing

tobias-urdin commented 2 years ago

I assume fille=dropna could be used, but you are explicitly interested in previous as well?

dbalagansky commented 2 years ago

I assume fille=dropna could be used, but you are explicitly interested in previous as well?

Yes. :)

As far as I understand, by using fill=dropna it would result in the same behaviour as above: I would only get data points in the resulting set where there are values in both series with sameish timestamp, like in the last command output I've shown (for default fill behaviour), which effectively results in losing some of the data.

gnocchixyz / gnocchi