OpenSLO / slogen

tool to create and manage content for reliability tracking from logs/event data.
Apache License 2.0
79 stars 6 forks source link

Changing SLOs causing issues with scheduled views #42

Closed lswith closed 2 years ago

lswith commented 2 years ago

I've noticed that when I update an SLO, often the dashboards and graphs do not update with the new data for that SLO.

As an example, I'm currently writing latency SLOs as follows:

apiVersion: openslo/v1alpha
kind: SLO
metadata:
  displayName: api Latency
  name: api-ltc
spec:
  service: account-opening
  description: The amount of POST requests to /api that are faster than 2s
  budgetingMethod: Occurrences
  objectives:
    - ratioMetrics:
        total:
          source: sumologic
          queryType: Logs
          query: |
            _sourceCategory=/xxx
            | kv "path", "elapsed_time"
            | where path matches /api/
        good: 
          source: sumologic
          queryType: Logs
          query: 'elapsed_time < 2000'
        incremental: true
      displayName: api calls that are faster than 2s
      target: 0.90

Now, if I update the SLO to a new value of 1s, the graphs and data that populates the new dashboards is the same data that was from the original elapsed_time < 2000.

I have a feeling this is because the data kept in a scheduled view is not deleted, but just disabled. https://help.sumologic.com/Manage/Scheduled-Views/Pause-or-Disable-Scheduled-Views#disable-a-scheduled-view

Once disabled, no additional data can be indexed in a scheduled view. A disabled scheduled view is not technically deleted, but it can't be re-enabled. If you disable a view and later create a new view with the same name, you won't see duplicate results; instead all the data from both scheduled views are treated as one.

If this is true, I wonder if it's worth hashing the query and appending it to the scheduled view name?

Often when I switch an SLO, I expect it to stop firing and re-evaluate the data.

lswith commented 2 years ago

A workaround is to rename the SLO to "xxx-v2" to get it to repopulate the data

agaurav commented 2 years ago

yeah this has been a known limitation of views. i like the query hash idea, it will cause a one time recreation of existing scheduled views when released but I think its well worth it. let me discuss this with other and get back.

lswith commented 2 years ago

Awesome! One more thing to add to this: The name of the SLO i.e. "api-ltc" will cause issues if it's too long because its directly used to name the scheduled view. This meant that when the view name i.e. "tf_slogen_view_api_ltc" becomes longer than 50 characters, it errors out.

I was wondering if when we add the "hash" to the view name, that we also limit the name to no more than 50 characters, regardless of what the SLO name is. This would mean that I could have something like "api-with-very-long-name-slo-ltc" and it would still work.

agaurav commented 2 years ago

yeah the 50 char limit has been frustrating, the search team has agreed to increase it to 256 but currently.

I was wondering if when we add the "hash" to the view name, that we also limit the name to no more than 50 characters

yes, this will be the ideal approach here for now. once the limit has been increased to 256 we can make the naming more human readable along with the hash suffix.

agaurav commented 2 years ago

added --useViewHash flag to add a hash of the query as suffix. By default the flag is false for preserving old behaviour without using the flag.

released in https://github.com/SumoLogic-Labs/slogen/releases/tag/v0.7.11

the view name char limit has also been increased to 255.