Closed chriswilley closed 2 years ago
How slow is “slow”?
I just timed pulling a 2KB log file and it took 2m 2s to appear.
Have you tested in v3.2?
I've got the same issue and it's only in v3.3.0 and v3.3.1. v3.2.* had no issue.
Let me get this right+
yes, that's right.
Candidates for blame:
Does it always take 2m2s? Are you using browser?
I'd have to do more timings, but anecdotally they all take about the same time.
And yes to browser. (Google Chrome)
Looks like it. These are the logs from the wait container where the artifacts (container logs in this case) are uploaded
time="2022-04-04T15:58:19.454Z" level=info msg="Starting deadline monitor"
time="2022-04-04T16:01:03.480Z" level=info msg="Main container completed"
time="2022-04-04T16:01:03.480Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-04-04T16:01:03.480Z" level=info msg="Saving logs"
time="2022-04-04T16:01:03.481Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: <redacted>/main.log"
time="2022-04-04T16:01:03.481Z" level=info msg="Creating minio client using IAM role"
time="2022-04-04T16:01:03.481Z" level=info msg="Saving file to s3" bucket=<redacted> endpoint=s3.amazonaws.com key=<redacted>/main.log path=/tmp/argo/outputs/logs/main.log
time="2022-04-04T16:03:07.717Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-04-04T16:03:07.717Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-04-04T16:03:07.717Z" level=info msg="No output parameters"
time="2022-04-04T16:03:07.717Z" level=info msg="No output artifacts"
time="2022-04-04T16:03:07.736Z" level=info msg="Create workflowtaskresults 201"
time="2022-04-04T16:03:07.737Z" level=info msg="Killing sidecars []"
time="2022-04-04T16:03:07.737Z" level=info msg="Alloc=5688 TotalAlloc=14812 Sys=23506 NumGC=7 Goroutines=10"
The main.log file is about 300KB here
If they take the exact same time, then that hints at some problem with Argo Server in one place.
Can you attach argo-server logs.
During the pod run
time="2022-04-04T15:58:18.145Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:27.204Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:27.212Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T16:00:27Z" grpc.time_ms=10.506 span.kind=server system=grpc
time="2022-04-04T16:00:27.547Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:39.579Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WorkflowLogs grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T16:00:27Z" grpc.time_ms=12034.497 span.kind=server system=grpc
time="2022-04-04T16:02:15.428Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T15:58:18Z" grpc.time_ms=237286.6 span.kind=server system=grpc
time="2022-04-04T16:03:18.092Z" level=info msg="Alloc=12073 TotalAlloc=5976661 Sys=29265 NumGC=7960 Goroutines=93"
time="2022-04-04T16:08:18.092Z" level=info msg="Alloc=11273 TotalAlloc=5978609 Sys=29265 NumGC=7963 Goroutines=93"
time="2022-04-04T16:11:22.522Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T15:57:36Z" grpc.time_ms=825619.3 span.kind=server system=grpc
time="2022-04-04T15:58:18.145Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:27.204Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:27.212Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T16:00:27Z" grpc.time_ms=10.506 span.kind=server system=grpc
time="2022-04-04T16:00:27.547Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:00:39.579Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WorkflowLogs grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T16:00:27Z" grpc.time_ms=12034.497 span.kind=server system=grpc
time="2022-04-04T16:02:15.428Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T15:58:18Z" grpc.time_ms=237286.6 span.kind=server system=grpc
time="2022-04-04T16:03:18.092Z" level=info msg="Alloc=12073 TotalAlloc=5976661 Sys=29265 NumGC=7960 Goroutines=93"
time="2022-04-04T16:08:18.092Z" level=info msg="Alloc=11273 TotalAlloc=5978609 Sys=29265 NumGC=7963 Goroutines=93"
time="2022-04-04T16:11:22.522Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2022-04-04T15:57:36Z" grpc.time_ms=825619.3 span.kind=server system=grpc
When getting logs from UI
time="2022-04-04T16:59:10.573Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:59:10.582Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=ListArchivedWorkflowLabelKeys grpc.service=workflowarchive.ArchivedWorkflowService grpc.start_time="2022-04-04T16:59:10Z" grpc.time_ms=14.643 span.kind=server system=grpc
time="2022-04-04T16:59:10.589Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=ListArchivedWorkflows grpc.service=workflowarchive.ArchivedWorkflowService grpc.start_time="2022-04-04T16:59:10Z" grpc.time_ms=18.937 span.kind=server system=grpc
time="2022-04-04T16:59:12.847Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T16:59:12.854Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetArchivedWorkflow grpc.service=workflowarchive.ArchivedWorkflowService grpc.start_time="2022-04-04T16:59:12Z" grpc.time_ms=9.938 span.kind=server system=grpc
time="2022-04-04T17:00:07.981Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T17:00:07.986Z" level=info msg="Download artifact" artifactName=main-logs isInput=false nodeId=<redacted> uid=d8fea3cc-b687-4373-a20a-0a9259157e31
time="2022-04-04T17:00:07.986Z" level=info msg="S3 Load path: /tmp/artifact3631983470, key: <redacted>/main.log"
time="2022-04-04T17:00:07.986Z" level=info msg="Creating minio client using IAM role"
time="2022-04-04T17:00:07.987Z" level=info msg="Getting file from s3" bucket=<redacted> endpoint=s3.amazonaws.com key=<redacted>/main.log path=/tmp/artifact3631983470
time="2022-04-04T17:01:07.986Z" level=info msg="using the default service account for user" subject=CgQzMDYzEgllZy1naXRodWI
time="2022-04-04T17:01:07.990Z" level=info msg="Download artifact" artifactName=main-logs isInput=false nodeId=<redacted> uid=d8fea3cc-b687-4373-a20a-0a9259157e31
time="2022-04-04T17:01:07.990Z" level=info msg="S3 Load path: /tmp/artifact2233602876, key: <redacted>/main.log"
time="2022-04-04T17:01:07.990Z" level=info msg="Creating minio client using IAM role"
time="2022-04-04T17:01:07.990Z" level=info msg="Getting file from s3" bucket=<redacted> endpoint=s3.amazonaws.com key=<redacted>/main.log path=/tmp/artifact2233602876
2022/04/04 17:02:11 http: superfluous response.WriteHeader call from github.com/argoproj/argo-workflows/v3/server/artifacts.(*ArtifactServer).returnArtifact (artifact_server.go:225)
2022/04/04 17:03:10 http: superfluous response.WriteHeader call from github.com/argoproj/argo-workflows/v3/server/artifacts.(*ArtifactServer).returnArtifact (artifact_server.go:225)
Maybe due to the timeouts set on my ingress, I never see the logs
Do you have the same problem when connecting directly, rather than via ingress or LB?
All my testing has been done when connecting through kubectl proxy
direct to the cluster.
I think we need to do live debugging.
Could you please set-up 30m?
Zoom is not accepting my password
I believe I'm in the waiting room now
Hmm, I don't see your booking, maybe you got booked with someone else. Could you do 10:30am PT tomorrow?
Yes
Cool. Are you able to book that slot tomorrow?
It says there are no slots available
11am or 11:30am?
do you see a booking for me at 1pm EST Wednesday?
Are you still available now?
git log --oneline v3.2.9..v3.3.0 -- go.mod
f0fb0d56d chore(deps): bump github.com/minio/minio-go/v7 from 7.0.15 to 7.0.23 (#8003)
aea6c3912 chore(deps): bump github.com/gorilla/websocket from 1.4.2 to 1.5.0 (#7991)
89d7cc39d chore(deps): bump github.com/tidwall/gjson from 1.13.0 to 1.14.0 (#7992)
6076524cd chore: Upgrade to Kubernetes v0.23. Fixes #6750 (#7832)
980f2feb7 chore(deps): bump github.com/gavv/httpexpect/v2 from 2.2.0 to 2.3.1 (#7979)
5e45cd95a chore(deps): bump github.com/antonmedv/expr from 1.8.9 to 1.9.0 (#7967)
79fc4a9be (tag: v3.3.0-rc6, tag: v3.3.0-rc5) chore(deps): bump upper.io/db.v3 (#7939)
848effce0 chore(deps): bump github.com/aliyun/aliyun-oss-go-sdk (#7919)
044389b55 chore(deps): bump github.com/aliyun/aliyun-oss-go-sdk (#7901)
dc82f3f42 chore(deps): bump github.com/prometheus/client_golang (#7880)
69c5bc79f fix: Fix go-jose dep. Fixes #7814 (#7874)
1d7a17714 chore(deps): bump github.com/soheilhy/cmux from 0.1.4 to 0.1.5 (#7848)
27977070c chore(deps): bump github.com/go-openapi/spec from 0.20.2 to 0.20.4 (#7817)
74d1bbef7 chore(deps): bump cloud.google.com/go/storage from 1.19.0 to 1.20.0 (#7747)
1277f0579 chore(deps): bump gopkg.in/square/go-jose.v2 from 2.5.1 to 2.6.0 (#7740)
7e6f2c0d7 chore(deps): bump github.com/valyala/fasttemplate from 1.1.0 to 1.2.1 (#7727)
877d65697 chore(deps): bump cloud.google.com/go/storage from 1.10.0 to 1.19.0 (#7714)
0f4c48473 chore(deps): bump gopkg.in/go-playground/webhooks.v5 (#7704)
4664aeac4 chore(deps): bump google.golang.org/grpc from v1.38.0 to v1.41.1 (#7658)
ecf2ceced chore(deps): bump github.com/grpc-ecosystem/go-grpc-middleware (#7679)
559153417 chore(deps): bump github.com/prometheus/common from 0.26.0 to 0.32.1 (#7660)
d7bc12343 chore: Upgrade tidwall/gjson to improve security (#7237)
a6ce659f8 feat!: Remove the hidden flag verify
from argo submit
(#7158)
f9e554d26 fix: Relative submodules in git artifacts. Fixes #7141 (#7162)
8d7ca73b0 feat: Upgrade to Golang 1.17 (#7029)
d815031dc chore: Upgrade argo event 1.15 (#6932)
606011b1c build: simplify using Golang 1.16 (#6893)
c962bb189 fix: upgrade sprig to v3.2.2 (#6876)
rm
@chriswilley I've created an engineering build that adds additional informational logging. This should help diagnose where the problem is. It will take 1h for the image to be published. The tag will be :dev-log
. Can you also test with :latest
to make sure that this issue has not been accidentally fixed?
You only need to change the argo-server image tag. Leave other configration unchanged.
Will do. Thanks.
I think I found the cause minio-go 7.0.23 introduced a regression that is fixed in 7.0.24: https://github.com/minio/minio-go/pull/1626
Will be fixed in v3.3.2
Summary
What happened/what you expected to happen?
After a pod completes, when we click on any input or output artifact (such as the
main
container log) in the Web UI it takes about a full minute to pull from S3. Previously on v3.1.12 it was very quick. Nothing about the network or EKS configurations have changed.What version are you running? v3.3.0
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.