argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.05k stars 3.2k forks source link

`argo cron` should report when CronWorkflow names exceed the length limit #6781

Open crenshaw-dev opened 3 years ago

crenshaw-dev commented 3 years ago

Summary

The argo cron command faithfully lists CronWorkflows, even ones that are rendered invalid due to the name exceeding the 52-character limit.

The workflow controller does log the issue. But argo cron gives the impression that nothing is wrong.

Use Cases

My team would benefit from this when a dev (who is unfamiliar with the workflow controller and its logs) deploys a CronWorkflow and then uses argo cron to maintain that CronWorkflow.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

tyrken commented 2 years ago

No error seen in Argo Web UI either, and it's only in the workflow controller logs when first created, not when you expect it to be scheduled.

terrytangyuan commented 2 years ago

Would anyone want to submit a PR to improve this experience?

tyrken commented 2 years ago

Sure, but as a non-golang, non-JS dev it might be a little clunky. Guessing I'd need to add a generic message field to the status like Workflow has, or maybe better a NameInvalid condition given you already seem to show that in the Web UI at least.

Know any existing code that might do this I can look at for inspiration?

agilgur5 commented 1 year ago

Related: #10264 for the UI.

It would be nice if the Controller could statically reject a CronWorkflow if it can tell that its generated name would be too long. That would prevent issues like this from happening in the first place.

bkanuka commented 8 months ago

My team just spent far too long diagnosing why a cron workflow was not running. Everything in the UI looked good, including saying that the workflow would run in X minutes and then it just wouldn't.

ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.

I believe this is actually a pretty high priority issue because it (almost) silently prevents a CronWorkflow from running, while simultaneously indicating that all is okay and that it will run. I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.

@alexec @terrytangyuan could you take another look at this issue and adjust labels if you see fit? I'm happy to provide more details if it helps.

agilgur5 commented 8 months ago

could you take another look at this issue and adjust labels if you see fit?

the labels are correct. we do not currently have a priority label for features, but can look at upvotes to determine popularity (this issue has 4 upvotes at the time of writing).

ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.

Did you use Server-Side Apply for this? Because there is also https://github.com/argoproj/argo-cd/issues/16817 when using client-side apply

I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.

It seems that Workflows does not seem to have a validating admission controller, which is something we could use to reject invalid resources. I mentioned this in https://github.com/argoproj/argo-workflows/issues/12693#issuecomment-1980000601 as well, and the caveat still applies -- if a validating webhook is not installed or bypassed, the case of invalid resources still has to be handled in some way, and so this issue would still be relevant for that case.

bkanuka commented 8 months ago

I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.

My team deployed a CronWorkflow and expected it to run and it didn't - plus because there was no Failure message from Argo, it didn't trigger an alert.

We do have the validating webhook and Server-Side apply.

Joibel commented 8 months ago

Did the metric argo_workflows_error_count with the cause CronWorkflowSubmissionError go up?

I'm not saying this is sufficient indication, but it is the indication I would expect.

Joibel commented 8 months ago

I am unaware of a validating admission controller option on argo workflows. I don't think this is currently possible, I'm intrigued by what you have enabled there.

agilgur5 commented 8 months ago

I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.

The Controller does log that it is invalid, per the issue description. And that is the only thing it can do at this time, as it does not handle admission.

To have other tooling like the CLI -- what this issue requests -- show that it is invalid would be a feature. They are not capable of that right now and so that would be adding new behavior. A bug would be existing, incorrect behavior, e.g. showing it is invalid when it is actually valid. Right now it has no indication whatsoever, the CLI does not validate while listing.

jessesuen commented 7 months ago

Possible improvements:

  1. display message in status that the cronworkflow is invalid due to its length
  2. CLI validation upon creation attempt and rejecting it if length is to great
tooptoop4 commented 2 weeks ago

i know this won't help creation via kubectl but do u think a cli flag that runs argo lint before accepting the submission could prevent this being created via argo cli?