Open lambinoo opened 3 weeks ago
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: lambinoo Once this PR has been reviewed and has the lgtm label, please assign mheon for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
@mheon ptal
I don't actually know why we're adding jobs to the channel in a goroutine, but it does make things convenient for this case. I imagine it ought to be all right - we're effectively queuing jobs for completion once the parallel executor comes up.
Though perhaps we should just make the parallel executor come up sooner in runtime init? I don't think it has any dependencies on things like c/storage - is there a reason not to initialize it as the very first thing a runtime does?
I don't actually know why we're adding jobs to the channel in a goroutine, but it does make things convenient for this case. I imagine it ought to be all right - we're effectively queuing jobs for completion once the parallel executor comes up.
I think his was caused by us adding removal code to the refresh logic: https://github.com/containers/podman/commit/9983e87440760883b2a8ecd83d93f1b0dd78d44a
My best guess is that we remove some container in a pod that then causes the container/pod stop/removal logic to add jobs to the queue.
Though perhaps we should just make the parallel executor come up sooner in runtime init? I don't think it has any dependencies on things like c/storage - is there a reason not to initialize it as the very first thing a runtime does?
Well it depends what jobs are submitted, I think it makes sense to wait at least for the runtime.valid = true
step. But one could also say nobody should submit jobs that depend on the runtime unless they already have a valid runtime...
In general it is really not well defined how the queue is supposed to work.
I don't actually know why we're adding jobs to the channel in a goroutine, but it does make things convenient for this case. I imagine it ought to be all right - we're effectively queuing jobs for completion once the parallel executor comes up.
Though perhaps we should just make the parallel executor come up sooner in runtime init? I don't think it has any dependencies on things like c/storage - is there a reason not to initialize it as the very first thing a runtime does?
At first, I did just move the setup worker function up, but podman crashed in the jobs, hence why I moved the queue setup in a separate function.
This is an attempt at fixing #22984.
It seems that if some background tasks are queued in libpod's
Runtime
before the worker's channel is set up (eg. in the refresh phase), they are not executed later on, but the workerGroup's counter is still ticked up. This leads podman to hang when the imageEngine is shutdown, since it waits for the workerGroup to be done.I'm not really sure why the function to queue a job works, as the queue object shouldn't exist, but I think that might be caused by my very sparse Go knowledge.
I would probably need some guidance to write a test for that one, as my test workflow has been:
reboot -f
journalctl list-jobs
to monitor the boot processAnd I'm honestly a bit confused at how do the reboot part in the tests.