Closed manno closed 3 years ago
Hi @manno!
I see you have merged this PR; are you going to make a quarks release with this change, or are you waiting for it to be tested from a dev build before you commit to a new release?
I'm hoping to see some confirmation from @univ0298 that it works as expected.
I was waiting to see a release but if that's not coming please let me know @manno thanks!
Either way is fine for me :) I'll create a release then.
Ok, interesting. Seems like the helm chart artifact from CI (from https://github.com/cloudfoundry-incubator/quarks-operator/actions/runs/732318599) is not public.
The only thing visible are the docker images: https://github.com/users/cfcontainerizationbot/packages/container/package/quarks-operator-dev
The release might take a till tomorrow, I'll attach the dev helm chart here: helm chart.zip
@manno I don't think this is working. What I'm seeing is that if I set the terminationGracePeriod high to allow for drains to complete, the drains complete but still the pod runs, and only when the grace period is exhausted will it terminate the pod. So it seems we have ended up with waiting forever (well until the grace period limit) instead of detecting that all drains are complete. Is there any way I can try to debug things?
Here is what it looks like in the pod after all the drains have ended:
/:/var/vcap/jobs/garden# ls -latR /mnt/
/mnt/:
total 8
drwxr-xr-x 1 root root 4096 Apr 19 19:29 .
drwxr-xr-x 1 root root 4096 Apr 19 19:29 ..
drwxrwsrwt 2 root adm 40 Apr 19 19:25 drain-stamps
/mnt/drain-stamps:
total 4
drwxr-xr-x 1 root root 4096 Apr 19 19:29 ..
drwxrwsrwt 2 root adm 40 Apr 19 19:25 .
Discussed this with @manno yesterday. The most obvious issue is that there is a mixup in the current code where it's writing to /mnt/drain-done but it's mounted /tmp/drain-stamps
However there are other issues as well, working through them with @manno
Motivation and Context
This adds a loop to wait for all other bpm containers, after the drain script has finished.
#177254980
This draft adds the shared empty dir also to the init containers, even though they don't need it.
Fixes https://github.com/cloudfoundry-incubator/quarks-operator/issues/1297