Closed meln5674 closed 1 month ago
P.S. I observed several other data races when I enabled the race detector, I strongly suggest using race detector-enabled binaries in CI. There may be several more latent issues hiding.
👀 I didn't know you could build -race
, thought it was only test -race
cc @isubasinghe as we've talked about races many times before (e.g. https://github.com/argoproj/argo-workflows/issues/10807#issuecomment-2090959022) and @Joibel regarding metrics
👀 I didn't know you could
build -race
, thought it was onlytest -race
Yep, build -cover
too, so you can get coverage data from e2e as well unit tests.
Upon further testing, it is specifically the metrics RoundTripper that causes the panic, but the logs one alone produces this other data race.
==================
WARNING: DATA RACE
Read at 0x00c0006a8ee8 by goroutine 177:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewUnfinishedWorkSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:268 +0x193
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:240 +0x19d
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/cron.NewCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/cron/controller.go:84 +0x4bb
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:270 +0x3dd
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap16()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x5d
Previous write at 0x00c0006a8ee8 by goroutine 176:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewUnfinishedWorkSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:268 +0x271
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:240 +0x19d
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/gccontroller.NewController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/gccontroller/gc_controller.go:52 +0x67a
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runGCcontroller()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:260 +0x1f4
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap15()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x5d
Goroutine 177 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x281d
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
Goroutine 176 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x272e
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
==================
==================
WARNING: DATA RACE
Read at 0x00c000418510 by goroutine 177:
runtime.growslice()
/usr/local/go/src/runtime/slice.go:177 +0x0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewUnfinishedWorkSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:268 +0x1c9
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:240 +0x19d
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/cron.NewCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/cron/controller.go:84 +0x4bb
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:270 +0x3dd
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap16()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x5d
Previous write at 0x00c000418510 by goroutine 176:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewUnfinishedWorkSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:268 +0x20c
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:240 +0x19d
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/gccontroller.NewController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/gccontroller/gc_controller.go:52 +0x67a
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runGCcontroller()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:260 +0x1f4
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap15()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x5d
Goroutine 177 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x281d
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
Goroutine 176 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x272e
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
==================
==================
WARNING: DATA RACE
Read at 0x00c0006a8f68 by goroutine 177:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewLongestRunningProcessorSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:280 +0x193
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:241 +0x1c7
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/cron.NewCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/cron/controller.go:84 +0x4bb
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:270 +0x3dd
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap16()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x5d
Previous write at 0x00c0006a8f68 by goroutine 176:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewLongestRunningProcessorSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:280 +0x271
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:241 +0x1c7
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/gccontroller.NewController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/gccontroller/gc_controller.go:52 +0x67a
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runGCcontroller()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:260 +0x1f4
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap15()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x5d
Goroutine 177 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x281d
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
Goroutine 176 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x272e
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
==================
==================
WARNING: DATA RACE
Read at 0x00c0004185d0 by goroutine 177:
runtime.growslice()
/usr/local/go/src/runtime/slice.go:177 +0x0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewLongestRunningProcessorSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:280 +0x1c9
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:241 +0x1c7
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/cron.NewCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/cron/controller.go:84 +0x4bb
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runCronController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:270 +0x3dd
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap16()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x5d
Previous write at 0x00c0004185d0 by goroutine 176:
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).NewLongestRunningProcessorSecondsMetric()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:280 +0x20c
k8s.io/client-go/util/workqueue.(*queueMetricsFactory).newQueueMetrics()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/metrics.go:241 +0x1c7
k8s.io/client-go/util/workqueue.newQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:88 +0x153
k8s.io/client-go/util/workqueue.NewWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/queue.go:59 +0x157
k8s.io/client-go/util/workqueue.NewDelayingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/delaying_queue.go:67 +0x9f
k8s.io/client-go/util/workqueue.NewRateLimitingQueueWithConfig()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:71 +0x15c
k8s.io/client-go/util/workqueue.NewNamedRateLimitingQueue()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/util/workqueue/rate_limiting_queue.go:87 +0xf0
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*Metrics).RateLimiterWithBusyWorkers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/work_queue.go:133 +0x16a
github.com/argoproj/argo-workflows/v3/workflow/gccontroller.NewController()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/gccontroller/gc_controller.go:52 +0x67a
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runGCcontroller()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:260 +0x1f4
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap15()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x5d
Goroutine 177 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:365 +0x281d
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
Goroutine 176 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:364 +0x272e
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
==================
Similarly, in addition to the data race in the original post, running with just the metrics RoundTripper produces additional data races
==================
WARNING: DATA RACE
Read at 0x00c000f1e600 by goroutine 197:
crypto/tls.(*Config).Clone()
/usr/local/go/src/crypto/tls/common.go:897 +0x95c
crypto/tls.dial()
/usr/local/go/src/crypto/tls/tls.go:153 +0x35b
crypto/tls.(*Dialer).DialContext()
/usr/local/go/src/crypto/tls/tls.go:219 +0x10a
k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).dialWithoutProxy()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go:321 +0x304
k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).dial()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go:177 +0x2b4
k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).Dial()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go:154 +0x133
k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).RoundTrip()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/httpstream/spdy/roundtripper.go:343 +0x5b2
github.com/argoproj/argo-workflows/v3/workflow/metrics.metricsRoundTripper.RoundTrip()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/metrics/metrics_k8s_request.go:53 +0x91
github.com/argoproj/argo-workflows/v3/workflow/metrics.(*metricsRoundTripper).RoundTrip()
<autogenerated>:1 +0x97
k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/transport/round_trippers.go:168 +0x4f2
net/http.send()
/usr/local/go/src/net/http/client.go:259 +0x9a6
net/http.(*Client).send()
/usr/local/go/src/net/http/client.go:180 +0x14e
net/http.(*Client).do()
/usr/local/go/src/net/http/client.go:725 +0x11b5
net/http.(*Client).Do()
/usr/local/go/src/net/http/client.go:590 +0x6a6
k8s.io/client-go/rest.(*Request).request()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/rest/request.go:1023 +0x3ac
k8s.io/client-go/rest.(*Request).Do()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/rest/request.go:1063 +0x119
github.com/argoproj/argo-workflows/v3/pkg/client/clientset/versioned/typed/workflow/v1alpha1.(*workflows).Update()
/home/andrew/git/src/github.com/meln5674/argo-workflows/pkg/client/clientset/versioned/typed/workflow/v1alpha1/workflow.go:117 +0x2a7
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).persistUpdates()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/operator.go:761 +0xaaf
github.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate.func1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/operator.go:191 +0x46
runtime.deferreturn()
/usr/local/go/src/runtime/panic.go:605 +0x5d
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:891 +0xefe
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:806 +0xec
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker-fm()
<autogenerated>:1 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x41
k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xc4
k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x10a
k8s.io/apimachinery/pkg/util/wait.Until()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x4e
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap18()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:370 +0x17
Previous write at 0x00c000f1e600 by goroutine 172:
k8s.io/client-go/transport.TLSConfigFor()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/transport/transport.go:88 +0x24b
k8s.io/client-go/rest.TLSConfigFor()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/rest/transport.go:57 +0x51
k8s.io/client-go/transport/spdy.RoundTripperFor()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/transport/spdy/spdy.go:38 +0x3b
k8s.io/client-go/tools/remotecommand.NewSPDYExecutor()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/client-go/tools/remotecommand/spdy.go:46 +0x39
github.com/argoproj/argo-workflows/v3/workflow/common.ExecPodContainer()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/common/util.go:93 +0xf84
github.com/argoproj/argo-workflows/v3/workflow/signal.ExecPodContainerAndGetOutput()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/signal/signal.go:49 +0xe6
github.com/argoproj/argo-workflows/v3/workflow/signal.SignalContainer()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/signal/signal.go:45 +0x764
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).signalContainers()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:675 +0x1d5
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextPodCleanupItem.func2()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:564 +0x6dd
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextPodCleanupItem()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:593 +0x3cf
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runPodCleanup()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:539 +0x47
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runPodCleanup-fm()
<autogenerated>:1 +0x21
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0x46
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x41
k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xc4
k8s.io/apimachinery/pkg/util/wait.JitterUntil()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x10a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 +0xd2
k8s.io/apimachinery/pkg/util/wait.UntilWithContext()
/home/andrew/git/src/github.com/meln5674/argo-workflows/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:170 +0x5c
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run.gowrap12()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:359 +0x17
Goroutine 197 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:370 +0x29d1
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
Goroutine 172 (running) created at:
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run()
/home/andrew/git/src/github.com/meln5674/argo-workflows/workflow/controller/controller.go:359 +0x2316
main.NewRootCommand.func1.1.gowrap1()
/home/andrew/git/src/github.com/meln5674/argo-workflows/cmd/workflow-controller/main.go:151 +0xaa
==================
Smoking gun has been identified, will make PR as soon as tests finish running.
cc @isubasinghe as we've talked about races many times before (e.g. #10807 (comment)) and @Joibel regarding metrics
Yeah I am not really surprised by the data races, golang doesn't give enough to catch these at compile time.
We should enable -race
in CI though, I did actually assume this was already done.
I did actually assume this was already done.
I think we have little to no pre-existing Go SMEs, with a lot of us learning as we go. I know you and Alan have been learning some of the tooling and idioms and might have the most Go knowledge these days
The race detector just gives warnings though, right? Or can we make it error out in CI? Because otherwise we'll still miss the warnings in the logs unless someone parses through them regularly
@agilgur5 it does exit with a non-zero exit code and could be detected in the CI, but I presume its not in CI because of how long it takes? It's basically a fuzzer not a static analysis tool and needs to brute force the search space effectively.
I would personally still have it in CI maybe running on the releases only or on a weekly cadence on main at least.
One thing I noticed in running the e2e tests locally is that, by default, the "kit" setup doesn't show logs for the operator, nor does it fail if either the server or controller crash, both of which seem counter to the point of an e2e test. There really ought to be some part of the CI setup that dumps the logs to stdout/err at the end disables restarts and causes the test to fail because there's nothing to actually act upon those tests, unless there's a test that specifically interrogates behavior after a restart.
I think we have little to no pre-existing Go SMEs, with a lot of us learning as we go. I know you and Alan have been learning some of the tooling and idioms and might have the most Go knowledge these days
While it may be a little pompous to describe myself as a Go SME, I have been writing in it professionally for about 6 years at this point, and we leverage Argo in quite a few places at my current employer, so if that experience is of use to you, it would be to my direct benefit to offer it beyond just this one bug, especially considering this particular issue ended up blocking my other PR whose functionality we are eagerly awaiting.
but I presume its not in CI because of how long it takes? It's basically a fuzzer not a static analysis tool and needs to brute force the search space effectively.
The performance hit to the app itself is relatively small on the scale of an automated test, the main issue is that it substantially increases compile times, and also requires a complete separate compile step (including recompiling all dependencies with the detector) compared to the "release" binaries, but I agree that a scheduled fuzz run plus as a gate before any numbered release is the best approach for the immediate future.
but I presume its not in CI because of how long it takes?
That was one reason why I never added it myself, though the original reason is probably that folks didn't know about it. In https://github.com/argoproj/argo-workflows/issues/10807#issuecomment-2090959022, I mentioned running it nightly or weekly. The race detector docs say "memory usage may increase by 5-10x and execution time by 2-20x.". The lower end of that would be feasible per PR.
I would personally still have it in CI maybe running on the releases only or on a weekly cadence on main at least.
I'd suggest nightly to make it easier to narrow down regressions. Not that we'd necessarily check it nightly, but we could back through each nightly run's history at least. I'll write up an issue for this.
It's basically a fuzzer not a static analysis tool and needs to brute force the search space effectively.
That's what I thought before too, but apparently it's not per the docs. I can't seem to find the docs I read earlier that made me think that. It's actually an instrumentation tool, similar to code coverage, and detects races via instrumentation. As such though, it only detects races that happen during run-time and not all theoretical races, so only a subset of conditions that we would be testing for, and we definitely miss some races in tests.
For reference for other folks in this thread, Isitha and I have previously discussed rewriting the core of Argo in a language with a more intelligent compiler, such as in Rust or Pony, which will detect at compile time a lot of issues we've had. "The core" is basically DAG processing / state machine, i.e. #13502. Notably, that actually wouldn't cover the case of this issue, as this is in the networking / metrics stack, which makes heavy usage of k8s libraries that have first-class Go support.
There really ought to be some part of the CI setup that dumps the logs to stdout/err at the end
Dumping the logs does actually already happen on CI
nor does it fail if either the server or controller crash
A test case itself should fail if there's a crash, so this should still result in a test failure.
While it may be a little pompous to describe myself as a Go SME [...] so if that experience is of use to you, it would be to my direct benefit to offer it beyond just this one bug
You and Mason, another recent contributor, both seem to have substantially more Go experience than me at least (Argo is effectively the largest Go project I've ever worked on, and probably the one I've wrote the most Go code for as well, since I tend to be architect and/or reviewer and/or debugger far more often than writer in my day job these days), so I think that's a fair statement to make to me at least as it is self-evident 😅 And yea all contributions are welcome and we are definitely in need of more contributors too! See also the Sustainability Effort. We'd be happy for more help, especially from someone who knows the Go ecosystem well! 🙂
the main issue is that it substantially increases compile times, and also requires a complete separate compile step (including recompiling all dependencies with the detector)
Now that I've re-read the race detector docs a few times over, it sounds like go test -race
would be enough for 99% of our tests, as nearly all of them run via go test
. So that could simplify the CI part a good bit if we don't need a separate build step.
But if we're writing a scheduled variant instead of a per PR job, it would require a whole new GHA workflow anyway, so a different compile step wouldn't actually be that much more effort either
I would personally still have it in CI maybe running on the releases only or on a weekly cadence on main at least.
I'd suggest nightly to make it easier to narrow down regressions. Not that we'd necessarily check it nightly, but we could back through each nightly run's history at least. I'll write up an issue for this.
Wrote up https://github.com/argoproj/argo-workflows/issues/13661
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
These two roundtripper wrappers, instantiated here introduce a data race.
To demonstrate, apply this modification to the Makefile
Then execute
make test-functional E2E_ENV_FACTOR=2
(make sure to show all logs frommake start
)Stack trace of the data race follow below.
This PR adds E2E tests that currently fail, but pass with those roundtripper wrappers commented out. When the race detector is not enabled, the workflow controller panics as follows intermittently during those tests when cleaning up multiple daemon pods in rapid succession:
Both the data race and panic still occur when reverting the changes in that PR other than the additional tests.
P.S. I observed several other data races when I enabled the race detector, I strongly suggest using race detector-enabled binaries in CI. There may be several more latent issues hiding.
Version(s)
ce7f9bfb9b45f009b3e85fabe5e6410de23c7c5f
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container