ccin2p3 / samplerr

Round robin timeseries middleware based on riemann and elasticsearch
Eclipse Public License 1.0
15 stars 4 forks source link

Add filter-based rotation of aliases #22

Open smortex opened 2 years ago

smortex commented 2 years ago

Instead of maintaining aliases for every day pointing to more coarse data in monthly/yearly indices, map aliases to indices to select only the most fine-grained data.

This allows graphing samplerr- and always having only* the most accurate data, without coarse data.

While here, add some CI tests for this code and integrate it with GitHub actions.

Todo

smortex commented 2 years ago

Yes, we switched to this code to manage aliases.

Bellow is a snapshot of graphana showing riemann streams rate: average metric for service:"riemann streams rate" AND cfunc:avg, grouped by service: Grafana graph

The leftmost part in yellow is produced by hourly data (step=3600), then the green part is from data with step=600, the blue part from the most fine data (step=20). This correspond to the steps in the example of the readme.

Yesterday we included this commit to strip the /<cfunc>/<step> part of the service name while deploying new monitoring tools, hence the orange color at the end of the graph that correspond to a growth in the values.

If I zoom in in this area of the graph, we can reach a point where the step between the green data points is too wide and grafana consider it cannot join the points anymore, which is consistent with our expectations. We also have a higher spike when we restarted the service after updating the plugin because we compute the avg value of a "thinner" duration, also expected (beware, some service color changed):

image

Bonus point with this change: the gap between the end of a service and the beginning of another service when we "group by service" will disapear (visible on the first graph).

faxm0dem commented 2 years ago

Do you think it would make sense to keep the old behaviour as an option ? Or to use the new one (and bump major version) ?

smortex commented 2 years ago

Do you think it would make sense to keep the old behaviour as an option ? Or to use the new one (and bump major version) ?

I don't know, I guess it's up to the maintainers to decide :trollface:

I am proposing this change because it helps me better manage my data. If you also benefit from it and think that should be the new standard, I have nothing against it. If the current code is fine and you want people to continue being able to use it we can ship both of them. If you think that could be confusing for end-users I can also just put this code in my local config.

For now, I considered the second scenario and the PR just add functions. This can be changed easily on demand (way easier for me to remove code than to write clojure ATM :smile:).

I just added a commit to rework aliases update (first todo item on my todo list in the original message) :wink:

smortex commented 2 years ago

So far so good (TZ is UTC+2 here):

romain@eddy ~ % ls -rt /var/log/riemann/* | xargs egrep 'Scheduling|samplerr alias'
/var/log/riemann/riemann.log.2022-07-09:INFO [2022-07-09 07:01:41,852] main - riemann.plugin.samplerr - Scheduling bootime alias management in 60 seconds
/var/log/riemann/riemann.log.2022-07-09:INFO [2022-07-09 07:01:41,855] main - riemann.plugin.samplerr - Scheduling daily alias management at 2022-07-10T02:01:00+02:00
/var/log/riemann/riemann.log.2022-07-09:INFO [2022-07-09 07:02:41,869] riemann task 3 - riemann.plugin.samplerr - Removing samplerr aliases
/var/log/riemann/riemann.log.2022-07-09:INFO [2022-07-09 07:02:42,151] riemann task 3 - riemann.plugin.samplerr - Creating samplerr aliases
/var/log/riemann/riemann.log.2022-07-10:INFO [2022-07-10 02:01:00,008] riemann task 1 - riemann.plugin.samplerr - Removing samplerr aliases
/var/log/riemann/riemann.log.2022-07-10:INFO [2022-07-10 02:01:00,377] riemann task 1 - riemann.plugin.samplerr - Creating samplerr aliases
/var/log/riemann/riemann.log:INFO [2022-07-11 02:01:00,004] riemann task 1 - riemann.plugin.samplerr - Removing samplerr aliases
/var/log/riemann/riemann.log:INFO [2022-07-11 02:01:00,297] riemann task 1 - riemann.plugin.samplerr - Creating samplerr aliases
romain@eddy ~ %
smortex commented 2 years ago

Only drawback: purging old data happen at another moment that alias update, between the two, some data cannot be fetched. Maybe we can rework the code to trigger a single event which purge then update aliases, something in this spirit:

(defn maintain
  [{:keys [conn alias-prefix index-prefix archives purge? update-aliases?]
    :or {purge? false
         update-aliases? true}}]
  (if purge? 
    (purge {:conn conn :index-prefix index-prefix :archives archives})) 
  (if update-aliases?
    (rotate-ng {:conn conn :index-prefix index-prefix :alias-prefix alias-prefix})))

What do you think?