Open crenshaw-dev opened 3 years ago
No error seen in Argo Web UI either, and it's only in the workflow controller logs when first created, not when you expect it to be scheduled.
Would anyone want to submit a PR to improve this experience?
Sure, but as a non-golang, non-JS dev it might be a little clunky. Guessing I'd need to add a generic message
field to the status like Workflow has, or maybe better a NameInvalid
condition given you already seem to show that in the Web UI at least.
Know any existing code that might do this I can look at for inspiration?
Related: #10264 for the UI.
It would be nice if the Controller could statically reject a CronWorkflow
if it can tell that its generated name would be too long. That would prevent issues like this from happening in the first place.
My team just spent far too long diagnosing why a cron workflow was not running. Everything in the UI looked good, including saying that the workflow would run in X minutes and then it just wouldn't.
ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.
I believe this is actually a pretty high priority issue because it (almost) silently prevents a CronWorkflow from running, while simultaneously indicating that all is okay and that it will run. I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.
@alexec @terrytangyuan could you take another look at this issue and adjust labels if you see fit? I'm happy to provide more details if it helps.
could you take another look at this issue and adjust labels if you see fit?
the labels are correct. we do not currently have a priority label for features, but can look at upvotes to determine popularity (this issue has 4 upvotes at the time of writing).
ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.
Did you use Server-Side Apply for this? Because there is also https://github.com/argoproj/argo-cd/issues/16817 when using client-side apply
I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.
It seems that Workflows does not seem to have a validating admission controller, which is something we could use to reject invalid resources. I mentioned this in https://github.com/argoproj/argo-workflows/issues/12693#issuecomment-1980000601 as well, and the caveat still applies -- if a validating webhook is not installed or bypassed, the case of invalid resources still has to be handled in some way, and so this issue would still be relevant for that case.
I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.
My team deployed a CronWorkflow and expected it to run and it didn't - plus because there was no Failure message from Argo, it didn't trigger an alert.
We do have the validating webhook and Server-Side apply.
Did the metric argo_workflows_error_count
with the cause
CronWorkflowSubmissionError
go up?
I'm not saying this is sufficient indication, but it is the indication I would expect.
I am unaware of a validating admission controller option on argo workflows. I don't think this is currently possible, I'm intrigued by what you have enabled there.
I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.
The Controller does log that it is invalid, per the issue description. And that is the only thing it can do at this time, as it does not handle admission.
To have other tooling like the CLI -- what this issue requests -- show that it is invalid would be a feature. They are not capable of that right now and so that would be adding new behavior. A bug would be existing, incorrect behavior, e.g. showing it is invalid when it is actually valid. Right now it has no indication whatsoever, the CLI does not validate while listing.
Possible improvements:
i know this won't help creation via kubectl but do u think a cli flag that runs argo lint before accepting the submission could prevent this being created via argo cli?
Summary
The
argo cron
command faithfully lists CronWorkflows, even ones that are rendered invalid due to the name exceeding the 52-character limit.The workflow controller does log the issue. But
argo cron
gives the impression that nothing is wrong.Use Cases
My team would benefit from this when a dev (who is unfamiliar with the workflow controller and its logs) deploys a CronWorkflow and then uses
argo cron
to maintain that CronWorkflow.Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.