argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.1k stars 3.2k forks source link

Archive Logs for init and wait container #12640

Open tczhao opened 9 months ago

tczhao commented 9 months ago

Summary

Currently argo ArchiveLogs only supports logs from main container. From time to time user is asking if it supports init and wait container log.

I'm aware the archivelog docs recommend a proper logging facility for logging, Maybe this is still something we could consider if there's enough 👍

Use Cases

When would you use this? Archive init and wait container log for debugging purpose


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

ljyanesm commented 9 months ago

I think it'd be enough if the docs (https://argo-workflows.readthedocs.io/en/latest/configure-archive-logs/) on top of suggesting:

⚠️ We do not recommend you rely on Argo Workflows to archive logs. Instead, use a conventional Kubernetes logging facility.

Had a link to the relevant section in workflow-controller-configmap.yaml for suggesting a better alternative.

Joibel commented 9 months ago

@ljyanesm there is already a PR in progress #12597 for the documentation for this. Please feel free to review and add your perspective on it.

panicboat commented 8 months ago

Hello. Are there any barriers to implementing this feature? If there are no barriers to implementation, I would be happy to take on the challenge.

I believe that we can simply modify this section slightly to make it feasible. Wouldn't this be a good first issue? https://github.com/argoproj/argo-workflows/blob/d5a4f7ef52a3022f9b16fb8093705ced0dd897d8/workflow/executor/executor.go#L597-L622

tczhao commented 8 months ago

I believe that we can simply modify this section slightly to make it feasible.

Yes,

But you would also need to consider

panicboat commented 8 months ago

Thanks for confirming.

if we should modify workflow-controller configmap so that the user can choose what to log

I'm thinking this could be recorded without awareness if archiveLogs: true is set. If you don't have strong feelings about this matter, could you please assign it to me so I can proceed? Having the assignments set up allows me to remember to work on them.

panicboat commented 6 months ago

📝 Simply adding the container name doesn't seem to work. https://github.com/argoproj/argo-workflows/blob/5cd84157078a1f3f013038f62f6243319b160035/workflow/executor/executor.go#L607

Changes

containerNames := append(we.Template.GetMainContainerNames(), common.InitContainerName, common.WaitContainerName)

Logs

vscode ➜ ~/go/src/github.com/argoproj/argo-workflows (main) $ kubectl logs arguments-parameters-from-configmap-14 -c wait
time="2024-05-22T06:48:56.008Z" level=info msg="Starting Workflow Executor" version=latest+d4b9327.dirty
time="2024-05-22T06:48:56.108Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2024-05-22T06:48:56.110Z" level=info msg="Executor initialized" deadline="2024-05-22 06:53:49 +0000 UTC" includeScriptOutput=false namespace=argo podName=arguments-parameters-from-configmap-14 templateName=whalesay version="&Version{Version:latest+d4b9327.dirty,BuildDate:2024-05-22T06:46:22Z,GitCommit:d4b9327b93511164d4f4401df6e867cd08faa2f7,GitTag:untagged,GitTreeState:dirty,GoVersion:go1.21.10,Compiler:gc,Platform:linux/amd64,}"
time="2024-05-22T06:48:56.211Z" level=debug msg="Create workflowtaskresults 403"
time="2024-05-22T06:48:56.274Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-05-22T06:48:56.282Z" level=debug msg="Patch pods 200"
time="2024-05-22T06:48:56.298Z" level=info msg="+++++++++++++++++Starting deadline monitor+++++++++++++++++"
time="2024-05-22T06:48:57.303Z" level=info msg="Main container completed" error="<nil>"
time="2024-05-22T06:48:57.305Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-05-22T06:48:57.305Z" level=info msg="No output parameters"
time="2024-05-22T06:48:57.306Z" level=info msg="No output artifacts"
time="2024-05-22T06:48:57.316Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: arguments-parameters-from-configmap-14/arguments-parameters-from-configmap-14/main.log"
time="2024-05-22T06:48:57.496Z" level=info msg="Creating minio client using static credentials" endpoint=s3.amazonaws.com
time="2024-05-22T06:48:57.497Z" level=info msg="Saving file to s3" bucket=panicboat-sandbox-723535945756 endpoint=s3.amazonaws.com key=arguments-parameters-from-configmap-14/arguments-parameters-from-configmap-14/main.log path=/tmp/argo/outputs/logs/main.log
time="2024-05-22T06:48:57.973Z" level=info msg="Save artifact" artifactName=main-logs duration=656.608042ms error="<nil>" key=arguments-parameters-from-configmap-14/arguments-parameters-from-configmap-14/main.log
time="2024-05-22T06:48:57.973Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2024-05-22T06:48:57.973Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2024-05-22T06:48:57.974Z" level=error msg="executor error: open /var/run/argo/ctr/init/combined: no such file or directory"
time="2024-05-22T06:48:57.974Z" level=error msg="executor error: open /var/run/argo/ctr/wait/combined: no such file or directory"
time="2024-05-22T06:48:57.978Z" level=debug msg="Create workflowtaskresults 403"
time="2024-05-22T06:48:57.979Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-05-22T06:48:57.991Z" level=debug msg="Patch pods 200"
time="2024-05-22T06:48:57.999Z" level=info msg="Alloc=8996 TotalAlloc=16888 Sys=24677 NumGC=6 Goroutines=10"
time="2024-05-22T06:48:58.001Z" level=debug msg="Patch workflowtaskresults 403"
time="2024-05-22T06:48:58.001Z" level=warning msg="failed to patch task result, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/latest/workflow-rbac/" error="workflowtaskresults.argoproj.io \"arguments-parameters-from-configmap-14\" is forbidden: User \"system:serviceaccount:argo:argo\" cannot patch resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2024-05-22T06:48:58.007Z" level=debug msg="Patch pods 200"
time="2024-05-22T06:48:58.011Z" level=fatal msg="open /var/run/argo/ctr/init/combined: no such file or directory"
Joibel commented 6 months ago

I'm thinking this could be recorded without awareness if archiveLogs: true is set.

I'd prefer it if this was separately controllable. For many scenarios this will just be archiving things people don't want, which will cost them money.

panicboat commented 5 months ago

I would like to work on this assignment, although the modifications are going to be more extensive than I originally thought. However, I have little knowledge of Golang or the implementation of Argo Workflows, so I was wondering if anyone would be willing to work with me on the task. I know this is a loaded request, so please consider it.

I believe this issue requires capturing the logs of the init/wait container as well as the main container, but I don't understand how to implement this.

tooptoop4 commented 1 month ago

dupe of https://github.com/argoproj/argo-workflows/issues/8902

agilgur5 commented 1 month ago

dupe of #8902

Marked it as superseded since there are more implementation details here and more upvotes