Closed aachenmax closed 2 years ago
Could you instead install the script as a file using artifacts and run that?
Hi Alex, I guess you mean like in this example here: https://github.com/argoproj/argo-workflows/blob/master/examples/input-artifact-s3.yaml ? That would work for sure. The other alternative (to keep it more k8s native) I thought of is to offload the script into a ConfigMap which can then be mounted as a volume and run from there. Both solutions however hide the actual source script from the workflow spec.
Is this a duplicate of #7527?
@alexec Can we add something to the upgrade guide for 3.2 if we plan to change the limit to 128kb instead of 256kb? We should also consider changing this: https://github.com/argoproj/argo-workflows/blob/4db1c4c8495d0b8e13c718207175273fe98555a2/workflow/executor/executor.go#L761-L766
@chazapis Any thoughts about https://github.com/argoproj/argo-workflows/commit/cecc379ce23e708479e4253bbbf14f7907272c9c causing a backwards compatibility change?
@blkperl, yes, I think this is a duplicate of #7527, and it seems to be caused by the switch from Debian images to Alpine.
This issue was raised as a “bug”, rather than a “regression”, but it seems like a regression in from v3.1 to v3.2. There is a separate issue type for regressions.
This regression maybe caused by using Alpine rather than Debian. This reduces ARG_MAX. It does not look possible to change this with simple reconfiguration (you must recompile your kernel):
https://unix.stackexchange.com/questions/336934/raise-128kib-limit-on-environment-variables-in-linux
However, there is an ugly, but workable solution. Because of the limit on the size of an annotation, we only need 256Kb total, so we can split ARGO_TEMPLATE into two env vars. The first 128Kb in ARGO_TEMPLATE and any extra 128Kb in ARGO_TEMPLATE_1.
@chazapis - I know you did this a long time ago - would you be interested in submitting a PR to fix.
Is @chazapis is not available - would anyone else like to submit a PR. I think this is a good first issue.
@alexec, unfortunately I can not do this right now.
refactor: Remove the need for pod annotations to be mounted as a volume (#6022)
This issue has two underlying causes which cannot be reverted:
@alexec the previous debian container was pinned to a specific version of debian:10.7-slim and not the debian:10-slim container that would be updated every 30d on their typical release cycle. We could also add the needed apt-get update calls to pull the needed security patches as part of the docker build to ensure that each release has the latest security patches (alpine has the same problem here and only difference is that we used the major container version and not a minor version that had been used like debian was set for)
I have a revert patch that I am happy to submit if the community is on board for debian as the exec base for that larger ARG_MAX limit for the given use case
We don’t want to go back to Debian. Instead, we can fix this in code.
@alexec I think the suggested patch will not work because ARG_MAX
is the total size of all environment variables and arguments where originally we thought it was the max length of a single variable.
I think the only option is to finish the move to distroless which puts us back on debian.
Oh. I feel sad about that. Do you have a link?
@alexec do you mean docs on ARG_MAX? I think this is the best option https://www.in-ulm.de/~mascheck/various/argmax/
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.
We're facing this issue when using exit handlers and passing workflow.failures
as command line argument to script. In case of workflow with multiple failures when this variable becomes large we're getting error.
Facing this issue with v3.3.8 as well, nothing out of the ordinary other than a particularly long input argument.
Could we possibly reopen this? This is still an issue that exists. We're trying to pass a (admittedly large) json containing initialization values for a map reduce workflow into a script template that is failing because of this.
This was fixed in #8169 but then reverted/removed in #8796. This comment on the revert from the original author mentions that this may now be fine after the distroless change in #8806 (which seems to be Debian-based?).
If you're still having this issue on a current version of Argo Workflows, please open a new issue.
This was fixed in #8169 but then reverted/removed in #8796. This comment on the revert from the original author mentions that this may now be fine after the distroless change in #8806 (which seems to be Debian-based?).
If you're still having this issue on a current version of Argo Workflows, please open a new issue.
Hey @agilgur5 we experience this problem with argoexec 3.5.1, do you suggest opening new issue?
That is indeed a current version, so yes, please open a new issue. Please include a reproduction and the error message you're getting.
Follow-up issue: #12190
Summary
What happened/what you expected to happen?
When executing a workflow which contains a step configured as a script (e.g. Python, bash) with source code exceeding about 100000 characters, the pod cannot be started, with the following error message shown for the Argo init container:
standard_init_linux.go:228: exec user process caused: argument list too long
This is due to the fact that the ARG_MAX / MAX_ARG_STRLEN limit for a single command, hardcoded at 131072 bytes in most kernels, is being exceeded. From my understanding/analysis, the actual script size is not a problem (until you approach the 1MB etcd limit I guess), since this is mounted into the container by Argo. The problem is that an environment variable named ARGO_TEMPLATE, containing the entire pod spec, is being set for the init, wait, and actual workload container. I believe in Argo versions < 3.2, this has been handled differently using pod annotations and volumes and therefore was not a problem and has been changed with this commit: https://github.com/argoproj/argo-workflows/commit/cecc379ce23e708479e4253bbbf14f7907272c9c
Of course one could ask: "Why do you need such a long script?" and argue that this should be better split up and run in consecutive or parallel containers. However, this might not be easily possible for certain applications. It does put an absolute limit on the spec/configuration size of a single pod that can be launched by Argo.
What version of Argo Workflows are you running? This affects versions >= 3.2 and was tested on Argo v3.2.3 Not an issue in versions 3.1 for example
Diagnostics
Example workflow YAML is attached as .txt large_script.txt
What Kubernetes provider are you using?
Tested it on
What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary
Tested it with executors
Logs from the workflow controller:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.