Open fischaz opened 4 years ago
Hello again,
I'm not sure that would be the same place to do it, but as I'm digging deeper into it. AWS Batch is affected by a different issue (but similar).
I've managed to configure a Batch AMI with the UserNS remap setting on the docker daemon and submitted a batch job.
the instance starts and all, but fails to execute the docker container with the following error in the docker daemon logs:
2020-02-27T04:11:49Z 030_docker time="2020-02-27T04:11:49.111185541Z" level=error msg="Handler for POST /v1.19/containers/create?name=ecs-BatchJobDefinition-ecs-batch-job-rdsrestorer-13-default-82d8aac7eddac2dd8401 returned error: cannot share the host's network namespace when user namespaces are enabled"
I've looked at my BatchJob Definition:
{
"jobDefinitionName": "BatchJobDefinition-ecs-batch-job-job1",
"jobDefinitionArn": "arn:aws:batch:ap-southeast-2:123456789012:job-definition/BatchJobDefinition-ecs-batch-job-job1:70",
"revision": 70,
"status": "ACTIVE",
"type": "container",
"parameters": {},
"retryStrategy": {
"attempts": 1
},
"containerProperties": {
"image": "123456789012.dkr.ecr.ap-southeast-2.amazonaws.com/image:1.2.0",
"vcpus": 1,
"memory": 128,
"command": [],
"jobRoleArn": "arn:aws:iam::123456789012:role/ecs-batch-job-ContainerRole",
"volumes": [],
"environment": [
{
"name": "NO_PROXY",
"value": "169.254.169.254,169.254.170.2"
},
{
"name": "HTTPS_PROXY",
"value": "webproxy:3128"
},
{
"name": "HTTP_PROXY",
"value": "webproxy:3128"
}
],
"mountPoints": [],
"readonlyRootFilesystem": true,
"privileged": false,
"ulimits": [],
"resourceRequirements": []
}
}
and I don't think I set the network settings here. So I assume that the AWS Batch service (when creating the ECS TaskDefinition to be run in the OnDemand ECS Cluster (Batch Managed) is the one setting the networking to host:
overall, the network mode of host has the same issue as the pidmode to host... Docker daemon will refuse to run that unless --userns is set to host.
https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations
which really in the case of Batch, is more or less the only mode.
I guess if a AWS customer is fully managing the ECS cluster (for batch and non-batch jobs) and configure the EC2 to use userNS-remap, then it would make sense for Batch to be aware of that and pass the userNS=host setting as well to the ECS cluster.
In my case though, I'll just never enable userNS-remap on the batch EC2 instance (no point of that if batch always use the network=host flag and thus would always use the userns=host flag later on (why enable the remap if it's never remapped)...
I thought I'd just share some feedback.
This would be a very useful feature for a number of applications that need access to root-owned host level files and UNIX sockets for monitoring and security.
@coultn, I'm hoping this issue will get some attention and make it onto the container-roadmap.
I'm also hitting this issue. There really needs to be an escape hatch for tasks run as specialized daemon sets! 🙏
Any update on the above?
Summary
we wish to enable UserNS remap support in our docker setup using ECS for security. The datadog agent container requires '--userns=host' when running in that mode, which is currently not supported by TaskDefinition
Description
UserNS remap is documented in https://success.docker.com/article/introduction-to-user-namespaces-in-docker-engine
activation of the mode is easy and it generally works. But if the ECS service (like datadog agent) requires --pid=host (to monitor all processes on the EC2 instance), when using userns-remap, the container must also run with --userns=host otherwise, Docker will fail to start the container with the following error:
docker run supports the flag as per https://docs.docker.com/engine/reference/commandline/run/
it was mentioned in aws/amazon-ecs-agent#502 but never implemented (probably due to a lack of requests).
Environment Details
Supporting Log Snippets
Docker logs issues with datadog agent: