argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.82k stars 3.17k forks source link

filter for archive ttl (by name or SQL) #13336

Open tooptoop4 opened 1 month ago

tooptoop4 commented 1 month ago

currently archive ttl docs show this, would be great if u could filter wfs by some name to not be deleted (or alternatively delete only those name prefixes specified).

Archive TTL

You can configure the time period to keep archived workflows before they will be deleted by the archived workflow garbage collection function. The default is forever.

Example:

   persistence: 
     archiveTTL: 10d
qingfeng777 commented 1 month ago

How about this? Under the persistence configuration, add a field named keepPrefixes. All matched workflows will not be deleted.

    persistence: 
      archiveTTL: 10d
      keepPrefixes:
        - xxxx
        - xxxx
Joibel commented 1 month ago

How about this? Under the persistence configuration, add a field named keepPrefixes. All matched workflows will not be deleted.

    persistence: 
      archiveTTL: 10d
      keepPrefixes:
        - xxxx
        - xxxx

I'm not sure I like this design.

Prefixes implies that you're naming your workflows in a common, but not otherwise required manner. Changing the list of preserved archived workflows is done centrally, and for big organizations would require coordination between the teams with the workflows and those with control over the controller.

If we are to stick to something like this I'd probably ask for something more complex such as using label matching for grouping workflows, along with namespace selection. Harder to implement as it needs more columns in the database

persistence: 
  archiveTTL: 10d
  ttlOverride: # I dislike this name, naming things is hard
    - labels:
        labelname: value1
        otherlabel: value2
      namespaces:
        - production
      ttl: 20d

I was wondering about a new field on the workflow itself, which could specify the archiveTTL for this workflow. This would override the central archiveTTL. It'd still want a new column, but is much simpler in other respects. I prefer this.

qingfeng777 commented 1 month ago

I have to admit that your design is more flexible and accurate. However, based on my experience, there are also many users who rely on naming to differentiate between different workflows (just like the author of this issue). This number of users is almost as large as those who rely on labels or namespaces to distinguish workflows. In such scenarios, the prefix matching design is very useful.

Additionally, the implementation of prefix matching does not conflict with your design. We can first implement the prefix matching design to solve part of the problem. Afterward, I am also willing to implement the solutions for label matching and namespace selection.

agilgur5 commented 1 month ago

Yea I was gonna suggest labels as well, especially as that matches k8s conventions and existing fields; most relevant being the existing archiveLabelSelector

Harder to implement as it needs more columns in the database

There is a label table already that may be able to do much of this. It does require a JOIN though and may not be able to implement all types of selectors.

Additionally, the implementation of prefix matching does not conflict with your design. We can first implement the prefix matching design to solve part of the problem.

For consideration, just because something can be done, does not mean that it should be done. Every feature has a maintenance burden and adding too many features can actually be poor UX, on top of being unsustainable for maintainers. Unconventional features can also be poor UX.

qingfeng777 commented 1 month ago

Agree, but there is still one more question. If we only implement the label filter, does the author of the issue need to change their usage to adopt our label filter?

agilgur5 commented 1 month ago

They might have to, unless they use an existing label they already have as the selector/filter. But if a user doesn't use prefixes (not just the issue author, aka "OP"), they'd have to adapt that as well.

We usually want to make sure a feature is usable by the broadest audience in the most conventional way.

If they want to workaround this feature, they could add the labels to their archived tables or could have a CronWorkflow to selectively delete rows instead of using archiveTTL. archiveTTL is basically a convenience to not need a CronWorkflow, as using one would also require you to have access to the DB and know the schema (which breaks encapsulation a bit).


Also @qingfeng777, if you're responding to the direch preceding comment and in full, you don't need to quote them, and quoting actually makes it harder to read (and if everyone did that we'd have those long quotes of quotes of quotes). Only quote when you're referencing a specific piece of someone's comment or a different comment than the prior one. You might've noticed that I edited some of your comments in order to do so.

qingfeng777 commented 1 month ago

OK, thank you for telling me that. I will try to implement the label matching later.

Joibel commented 1 month ago

We don't seem to have considered my preferred design which was to add a field to the workflow itself to control the TTL for it.

qingfeng777 commented 1 month ago

I prefer the design that adds a field to the workflow itself to control the TTL as well. I have a basic implementation design, and any advice is welcome.

The idea is to add a TTL field to the workflowSpec. If a workflow does not set its TTL, the original archiveTTL configuration of the controller will be used. Otherwise, the workflow's own TTL will take precedence. A background goroutine will be run to delete all workflows where the current time minus the finish time exceeds the TTL.

agilgur5 commented 1 month ago

Mechanically speaking, that could be done with a label as well, especially to avoid an extra column. It really should be an annotation if used like that though (i.e. "directives" or "non-standard behavior"), but we don't have a table of annotations already.

If it's part of the spec, we'd probably want a generated column that pulls it out of the workflow JSON.

The idea is to add a TTL field to the workflowSpec

archiveTTL to be exact. We could potentially make it a subfield of ttlStrategy but I'm not sure about that.

A background goroutine will be run to delete all workflows where the current time minus the finish time exceeds the TTL.

The existing archiveTTL goroutine should probably be re-used.

qingfeng777 commented 1 month ago

If we use a label for this, it might significantly increase the database load when matching archived workflows in the large JSON column. So, as you suggested, how about adding an archivedTTL to the ttlStrategy?