go-vela / community

Community Information for Vela (Target's official Pipeline Automation Framework)
https://go-vela.github.io/docs/
Apache License 2.0
23 stars 3 forks source link

worker: unable to accept workloads due to log streaming timeout #890

Open wass3rw3rk opened 11 months ago

wass3rw3rk commented 11 months ago

Description

the worker has a log streaming timeout setting (defaults to 5 mins) - https://github.com/go-vela/worker/blob/b45d0ce710ef208ca1330fc6904c15a38e6d08c7/executor/flags.go#L33-L38. this setting is intended to allow containers some padded time (after the build has finished) to wrap up streaming logs.

the intention is for this timeout to be used only if there is actual log activity, ie. it should exit if there isn't any. however, in the current implementation, the worker is not able to pick up new workloads for the given log streaming timeout even when all containers are done producing logs.

this issue is limited to pipelines utilizing services and for a reproducible example you can use https://go-vela.github.io/docs/usage/examples/postgres/ . the pipeline will execute the 15s sleep and do an action and be done. the build will be marked completed appropriately, but the service will continue running and waiting for the log stream timeout to finish before the worker is able to accept a new workload.

this can result in unnecessary queue build up (or unnecessary worker pool scaling) since workers would be able to alleviate the pressure if they exited log streaming appropriately.

Workaround

lowering the timeout

Value

more efficient workers

Useful Information

  1. What is the output of vela --version?

0.22.0

  1. What operating system is being used?
  1. Any other important details?

example pipeline to test: https://go-vela.github.io/docs/usage/examples/postgres/

wass3rw3rk commented 11 months ago

possibly related: https://github.com/go-vela/community/issues/744