argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

reduce pod definition size #13089

Open tooptoop4 opened 6 months ago

tooptoop4 commented 6 months ago

Summary

The pod spec for steps run from argo is quite large and can fill etcd. Wanting to trim this down, I imagine there will be 2 parts:

  1. code changes
  2. docs for configuration of init vs main vs wait containers

thoughts/Qs:

  1. [x] is ARGO_TEMPLATE env variable needed on all 3 containers?
  2. [x] are volumeMounts needed on both main and wait containers?
  3. [x] do all 3 containers need environment variables for communicating with s3 ie via artifactRepository? (i'm guessing main container doesn't)
    • seems this is already handled
  4. [ ] ARGO_TEMPLATE is huge, is the ARGO_TEMPLATE env variable containing things it doesn't need? perhaps some containers need things within that other containers don't?
    • more analysis still todo
  5. [x] are ARGO_PROGRESS_PATCH_TICK_DURATION/ARGO_PROGRESS_FILE_TICK_DURATION/ARGO_INCLUDE_SCRIPT_OUTPUT/ARGO_PROGRESS_FILE env variables needed?
  6. [x] do the commands on all the containers need --loglevel/--log-format?
  7. [x] i think within ARGO_TEMPLATE it does not need volumeMounts related to configmaps of user code like some.py configmap
  8. [x] i think within ARGO_TEMPLATE it does not need some user provided env variables

Use Cases

ensure etcd does not fill up

jswxstw commented 6 months ago
  • is ARGO_TEMPLATE env variable needed on all 3 containers?

ARGO_TEMPLATE may be huge, #12325 provides an optimization solution for EnvVarTemplate offload, perhaps it can be made the default logic. However, its lifecycle is aligned with the workflow, not the pod, considering delete it when pod gc?

are ARGO_PROGRESS_PATCH_TICK_DURATION/ARGO_PROGRESS_FILE_TICK_DURATION/ARGO_INCLUDE_SCRIPT_OUTPUT/ARGO_PROGRESS_FILE env variables needed?

ARGO_PROGRESS_PATCH_TICK_DURATION/ARGO_PROGRESS_FILE_TICK_DURATION/ARGO_PROGRESS_FILE are used to implement self reporting progress. ARGO_INCLUDE_SCRIPT_OUTPUT is used to determine whether stdout needs to be saved.

I think optimizing the reuse of ARGO_TEMPLATE would be sufficient. The other aspects have minimal impact, there's no need to be overly demanding.

tooptoop4 commented 5 months ago

@jswxstw do u know what ARGO_TEMPLATE is for? and do all 3 containers need it?

jswxstw commented 5 months ago

All 3 containers need ARGO_TEMPLATE to prepare inputs or save outputs, and in some types of templates like Script/Resourct/ContainerSet, it serves other purposes as well.