bake STOPSIGNAL into Dockerfile

Lewiscowles1986 commented 5 years ago

Dockerfile syntax now supports a STOPSIGNAL directive which AFAIK should make the README.md comment on SIGUSR2 redundant https://docs.docker.com/engine/reference/builder/#stopsignal

vito commented 5 years ago

Interesting, had no idea that existed. Looks like it's been there since 2015: https://github.com/moby/moby/blob/4b18f8c4bf63ef27045b79fb110f4b84c1dc1819/CHANGELOG.md#190-2015-11-03

@cirocosta Any thoughts on this? Would it make sense for k8s too?

edit: also would you mind doing the DCO process? It's pretty lightweight: https://github.com/concourse/concourse-docker/pull/44/checks?check_run_id=119750704

Lewiscowles1986 commented 5 years ago

Thank you for :eyes: and feedback.

DCO Done.

@cirocosta Any thoughts on this? Would it make sense for k8s too?

FWIW I've never seen any other application pass this signal via Dockerfile. What it ensures is that docker stop and docker-compose down should behave.

cirocosta commented 5 years ago

Hi @Lewiscowles1986 , thanks for the contribution!

Interesting, had no idea that existed. Looks like it's been there since 2015

Interestingly, it only got to the OCI spec two years later: https://github.com/opencontainers/image-spec/pull/544, in which case, cri-o followed upon quickly after.

@cirocosta Any thoughts on this? Would it make sense for k8s too?

hmmmmm, it makes sense for those expecting a scale-in / pod-(termination|eviction) to retire all the time, but I'd rather have that more explicitly set and controlled with a preStop hook or something (like we do in the Helm Chart).

    ## Signal to send to the worker container when shutting down.
    ## Possible values:
    ##
    ## - SIGUSR1: land the worker, and
    ## - SIGUSR2: retire the worker.
    ##
    ## Note.: using SIGUSR2 with persistence enabled implies the use of an
    ## initContainer that removes any data the existed previously under
    ## `concourse.worker.workDir` as the action of `retire`ing a worker implies
    ## that no state comes back with it when re-registering.
    ##
    ## Ref: https://concourse-ci.org/concourse-worker.html
    ## Ref: https://concourse-ci.org/worker-internals.html
    ##
    shutdownSignal: SIGUSR2

(see shutdownSignal in charts/stable/concourse)

At the execution of the preStop, we wait until the process goes away (or terminationGracePeriod kicks in sending a SIGKILL), thus, whatever default signal gets set here, it won't affect the chart, but not everyone uses the chart.

Personally, I prefer the explicitness of telling whatever runs my containers to have a specific behavior, rather than putting that behind the image definition.

tl;dr: the Chart would work just fine, but for those who consume the "raw image", it might affect them if they're expecting a SIGTERM - the default behavior.

cirocosta commented 5 years ago

I wanted to double check if those things I said are really true hahah, here's a table with the results:

https://github.com/cirocosta/k8s-sample-signal-pass/tree/17ac974405af494264ac9a9ab99c8bcdd3ed59a2#results

The lower right is what I mentioned 🤔 does that make sense?

Thanks!

vito commented 5 years ago

@cirocosta If the worker is shutting down all of its containers and volumes will go away with its container, so shouldn't that pretty much always be a retire? If the worker lands but its containers and volumes are gone when it comes back, the ATC will have invalid entries in its database. Those will eventually go away (thanks to missing_since), but I think it'd be clearer if the worker retired, cleared out the database, and then came back as a new worker.

Although, for this use case retiring isn't really aggressive enough. Retiring can take a while and exceed the stop timeout. I'm not sure what signal Docker sends after the timeout, but if it's SIGKILL the worker won't be able to intercept it and delete itself. In this scenario the worker could end up coming back under the same name before the ATC drains and removes it. So that could use some work. :thinking: Maybe we should just have ephemeral workers delete themselves on shutdown, and let people use the concourse retire-worker command themselves when they want to actually drain.

cirocosta commented 5 years ago

@cirocosta If the worker is shutting down all of its containers and volumes will go away with its container, so shouldn't that pretty much always be a retire?

Yeah, that's totally valid when having ephemeral containers like a Deployment without any persistent volumes would have.

For those using StatefulSets though, the containers came back with:

the same name, and
the same disk.

In that case, the "going away" is more like a landing as it comes back with the "same" state (just volumes, actually). In the case of the Chart, we allow that if the person wants, although the default is the retire + volume wiping on start (with https://github.com/concourse/concourse/issues/3600 we could perhaps go directly with "land" workflow for the StatefulSet case 👀 )

(yeah, I'd like a lot more to just go ephemeral in any case, but at the moment, it's way easier to use a StatefulSet as a way of getting separate disks attached to your pod and not worry about filling the machine's default disk or consuming all of its bandwidth).

I'm not sure what signal Docker sends after the timeout, but if it's SIGKILL the worker won't be able to intercept it and delete itself.

yeahh, that's the case in k8s as well - something like this:

kubectl delete pod
if preStop configured:
- runs preStop
if we're still within terminationGracePeriod
- sends default signal (either SIGTERM - by default - or image configured STOPSIGNAL)
anytime terminationGracePeriod is hit:
- send SIGKILL

In this scenario the worker could end up coming back under the same name before the ATC drains and removes it.

That's true! Do you think that's more reason for https://github.com/concourse/concourse/issues/3600 ?

Maybe we should just have ephemeral workers delete themselves on shutdown, and let people use the concourse retire-worker command themselves when they want to actually drain.

hmmmm that's interesting - maybe that's something that deserves its own issue :thinking:

Do those answers help? thanks!!

vito commented 5 years ago

For those using StatefulSets though, the containers came back with:

the same name, and

the same disk.

Didn't know they came back with the same disk. That seems kind of janky - does Garden just reap the container depot on start, then? I guess keeping the volumes around is nice, it's just weird that only half of the state remains.

That's true! Do you think that's more reason for concourse/concourse#3600 ?

Yep.

hmmmm that's interesting - maybe that's something that deserves its own issue

Yep. We do have this tracked as https://github.com/concourse/concourse/issues/2827 but we never proposed that as a solution. I'll suggest it there.

Lewiscowles1986 commented 5 years ago

I have no idea what everyone is talking about. Can I be tagged back in specifically if something for me is here. Thanks

vito commented 5 years ago

@Lewiscowles1986 LOL. Sorry. I think at this point the change itself is fine, we're just making sure we understand the impact it may have and where it fits on our roadmap.

@cirocosta I'm leaning towards merging this as it feels like a more sensible default for Docker, but it sounds like it'd have impact on those using StatefulSets. What would have to change in the chart for this? Is it just a documentation thing?

cirocosta commented 5 years ago

@vito sounds good! No changes actually 😁

For those consuming the Chart, everything is transparent - if they configure a specific signal (which we already default to SIGUSR2), we'll forward it and do the right thing.

thx!

vito commented 5 years ago

👍 merging - thanks @Lewiscowles1986!

Lewiscowles1986 commented 5 years ago

Should I PR the README.md to remove the note about SIGUSR2?

concourse / concourse-docker

bake STOPSIGNAL into Dockerfile #44