Open tooptoop4 opened 3 months ago
How about this? Under the persistence
configuration, add a field named keepPrefixes
. All matched workflows will not be deleted.
persistence:
archiveTTL: 10d
keepPrefixes:
- xxxx
- xxxx
How about this? Under the
persistence
configuration, add a field namedkeepPrefixes
. All matched workflows will not be deleted.persistence: archiveTTL: 10d keepPrefixes: - xxxx - xxxx
I'm not sure I like this design.
Prefixes implies that you're naming your workflows in a common, but not otherwise required manner. Changing the list of preserved archived workflows is done centrally, and for big organizations would require coordination between the teams with the workflows and those with control over the controller.
If we are to stick to something like this I'd probably ask for something more complex such as using label matching for grouping workflows, along with namespace selection. Harder to implement as it needs more columns in the database
persistence:
archiveTTL: 10d
ttlOverride: # I dislike this name, naming things is hard
- labels:
labelname: value1
otherlabel: value2
namespaces:
- production
ttl: 20d
I was wondering about a new field on the workflow itself, which could specify the archiveTTL for this workflow. This would override the central archiveTTL. It'd still want a new column, but is much simpler in other respects. I prefer this.
I have to admit that your design is more flexible and accurate. However, based on my experience, there are also many users who rely on naming to differentiate between different workflows (just like the author of this issue). This number of users is almost as large as those who rely on labels or namespaces to distinguish workflows. In such scenarios, the prefix matching design is very useful.
Additionally, the implementation of prefix matching does not conflict with your design. We can first implement the prefix matching design to solve part of the problem. Afterward, I am also willing to implement the solutions for label matching and namespace selection.
Yea I was gonna suggest labels as well, especially as that matches k8s conventions and existing fields; most relevant being the existing archiveLabelSelector
Harder to implement as it needs more columns in the database
There is a label table already that may be able to do much of this. It does require a JOIN though and may not be able to implement all types of selectors.
Additionally, the implementation of prefix matching does not conflict with your design. We can first implement the prefix matching design to solve part of the problem.
For consideration, just because something can be done, does not mean that it should be done. Every feature has a maintenance burden and adding too many features can actually be poor UX, on top of being unsustainable for maintainers. Unconventional features can also be poor UX.
Agree, but there is still one more question. If we only implement the label filter, does the author of the issue need to change their usage to adopt our label filter?
They might have to, unless they use an existing label they already have as the selector/filter. But if a user doesn't use prefixes (not just the issue author, aka "OP"), they'd have to adapt that as well.
We usually want to make sure a feature is usable by the broadest audience in the most conventional way.
If they want to workaround this feature, they could add the labels to their archived tables or could have a CronWorkflow to selectively delete rows instead of using archiveTTL
.
archiveTTL
is basically a convenience to not need a CronWorkflow, as using one would also require you to have access to the DB and know the schema (which breaks encapsulation a bit).
Also @qingfeng777, if you're responding to the direch preceding comment and in full, you don't need to quote them, and quoting actually makes it harder to read (and if everyone did that we'd have those long quotes of quotes of quotes). Only quote when you're referencing a specific piece of someone's comment or a different comment than the prior one. You might've noticed that I edited some of your comments in order to do so.
OK, thank you for telling me that. I will try to implement the label matching later.
We don't seem to have considered my preferred design which was to add a field to the workflow itself to control the TTL for it.
I prefer the design that adds a field to the workflow itself to control the TTL as well. I have a basic implementation design, and any advice is welcome.
The idea is to add a TTL
field to the workflowSpec. If a workflow does not set its TTL
, the original archiveTTL
configuration of the controller will be used. Otherwise, the workflow's own TTL
will take precedence. A background goroutine will be run to delete all workflows where the current time minus the finish time exceeds the TTL.
Mechanically speaking, that could be done with a label as well, especially to avoid an extra column. It really should be an annotation if used like that though (i.e. "directives" or "non-standard behavior"), but we don't have a table of annotations already.
If it's part of the spec, we'd probably want a generated column that pulls it out of the workflow JSON.
The idea is to add a
TTL
field to the workflowSpec
archiveTTL
to be exact. We could potentially make it a subfield of ttlStrategy
but I'm not sure about that.
A background goroutine will be run to delete all workflows where the current time minus the finish time exceeds the TTL.
The existing archiveTTL
goroutine should probably be re-used.
If we use a label for this, it might significantly increase the database load when matching archived workflows in the large JSON column. So, as you suggested, how about adding an archivedTTL to the ttlStrategy?
currently archive ttl docs show this, would be great if u could filter wfs by some name to not be deleted (or alternatively delete only those name prefixes specified).