aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 320 forks source link

[Fargate] [bug/regression]: InitProcessEnabled not working anymore with 1.4.0? #906

Open tpunder opened 4 years ago

tpunder commented 4 years ago

Community Note

Tell us about your request When I try to enable the InitProcessEnabled LinuxParameter:

"LinuxParameters": {
  "Capabilities": {},
  "InitProcessEnabled": true
}

I get this error when trying to launch my task:

"reason": "CannotStartContainerError: ResourceInitializationError: failed to create new container runtime task: OCI runtime create failed: container_linux.go:349: starting container process caused \"process_linux.go:449: container init caused \"rootfs_linux.go:58:...",

Which service(s) is this request for? Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I'm trying to make use of the docker run --init feature to reap zombie processes that might be caused by my container (e.g. Chrome processes used for creating PDFs).

Are you currently working around this issue? I currently have InitProcessEnabled set to false. This is fine for testing but will not work for production. If needed I will look into directly using tini (https://github.com/krallin/tini) in my container images.

tpunder commented 4 years ago

I did a little bit of digging to see if this was something that I could fix myself. Here is what I found:

It looks like the --init parameter of docker run --init gets translated by the Docker CLI into an Init field on the HostConfig:

"HostConfig": {
  ...other_host_config_fields...,
  "Init": true
}

After some back and forth between the Docker CLI and Docker Daemon the "Init": true gets used here:

https://github.com/docker/docker-ce/blob/fdfb4bfa0dd1de81af6e0647cefdf70d49bcb331/components/engine/daemon/oci_linux.go#L742-L763

It looks like it modifies the Container Spec to add in a bind-mount for /usr/bin/docker-init (which is just tini) from the host system to /sbin/docker-init inside the container and prepends /sbin/docker-init -- to the command that runs. This then gets passed onto containerd (and presumably runc).

In Fargate 1.3.0 I'm guessing that the "initProcessEnabled": true part of the task definition gets translation into a HostConfig with "Init": true which is then passed onto the ECS Agent (https://github.com/aws/amazon-ecs-agent) which then just passes it through to Docker (where things work as expected).

With Fargate 1.4.0 using containerd directly (and firecracker-containerd?) via a new Fargate Agent it looks like this Docker specific logic is being bypassed and maybe isn't working right in the new Fargate Agent (maybe a missing tini binary or something?). But that is entirely a guess based on my limited knowledge. The fact that I'm getting an error message seems to indicate it is trying to do something with the "InitProcessEnabled": true flag. The complete error message probably has some better clues in it.

But either way, I suspect the code that would have to be modified is not currently open sourced. Fortunately, the workaround of just adding tini directly to my images and no longer relying on the Docker --init flag is a pretty easy to implement.

tdussmann commented 4 years ago

@tpunder Concerned from reading your bug description after we've just switched on initProcessEnabled in our task definitions, we wondered if this would prevent us from upgrading to 1.4.0 or if it was maybe fixed after all. So yesterday we cloned one of our services and it started up no problem with a fargate version of 1.4.0. So you might want to retest this and possible close this issue if you can confirm that this works as expected.