aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 318 forks source link

[Fargate] [request]: Please support tmpfs #736

Open ghost opened 4 years ago

ghost commented 4 years ago

Community Note

Tell us about your request To write temporary data to a file in ReadonlyRootFilesystem mode, please support tmpfs on Fargate.

Which service(s) is this request for? Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? When Apache enables the MPM module, the parent process ID (pid) is written to the pidfile. In read-only mode, it cannot write data to a file and create a new file, so we must use the tmpfs feature. However, since Fargate does not support this feature, apache fails to start and outputs the following error message:

(30)Read-only file system: AH00099: could not create /run/apache2/httpd.pid
AH00100: httpd: could not log pid to file /run/apache2/httpd.pid

For security reasons, I do want to run the container in ReadonlyRootFilesystem mode. So, please support tmpfs.

jackmokcx commented 4 years ago

+1

gsurfgaropaba commented 2 years ago

+1

pspot2 commented 2 years ago

ECS Fargate provides 3 times more memory (in the max configuration) than Lambda (30 GB vs 10 GB). This makes Fargate attractive for processing large datasets in-memory for performance and security reasons (especially if those datasets contain sensitive data).

tmpfs is a straightforward way to mount a memory-based filesystem on OS level and expose it to userspace applications for using their standard FS methods.

Please enable it for Fargate.

rickknowles-cognitant commented 2 years ago

Amazon, seriously ? Including the option of readonly root filesystems and then not implementing tmpfs is like selling cars without seatbelts. What on earth were you thinking ?

rickknowles-cognitant commented 2 years ago

This is maddening, because there's not even a workaround possible if you run any more than one instance of a container.

It's tempting to think an EFS share might be a suitable alternative mountable "temp" space, but the problem arises with the /tmp folder when you have a service that has say 4 tasks associated and added to a load balancer. Mapping a shared EFS /tmp folder across the 4 instances is a madman's choice given how often applications consider the space as their own private slop space.

Honestly, this is such a no brainer requirement that "rootfs as read-only" without it is next to useless in all but trivial cases.

As @pspot2 said above, these containers aren't short on memory ... the smallest one you can get is 512MB. The overwhelming majority of containers we deploy use at most 128MB in practice. It's not like it should cost more to offer.

rickknowles-cognitant commented 2 years ago

Follow up - there's not a doubt in my mind that the following is a god-awful hack that AWS should be embarrassed about putting forward when people ask for tmpfs, but here it is ... a workaround in case anyone else needs it.

  1. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html#bind-mount-examples (see the section "To provide an empty data volume for one or more containers")
  2. https://github.com/aws/containers-roadmap/issues/938#issuecomment-868096035

These two basically have you bind-mounting a host volume into the container you want the tmpfs volume to be, and then declaring a second container outside the scope of the main one that runs busybox and changes the owner / perms on the volume before the main one launches so its writeable. The conditional dependsOn attribute is needed to make the containers launch in the right order so the perms are set on start.

I feel dirty just writing this out, but hopefully it will help someone else who is waiting for AWS to offer tmpfs on fargate the way they certainly know they should.

rogeruiz commented 1 year ago

@rickknowles-cognitant and everyone else who stumbles onto this PR. There is a better different solution! It's possible to have this permission sidecar / container definitions business done within the Dockerfile definition rather than requiring the new image.

By default, the volume permissions are set to 0755 and the owner as root. These permissions can be changed in the Dockerfile. In the following example, the owner of the /var/log/exported directory is set to node.

FROM public.ecr.aws/amazonlinux/amazonlinux:latest
RUN yum install -y shadow-utils && yum clean all
RUN useradd node
RUN mkdir -p /var/log/exported && chown node:node /var/log/exported
RUN touch /var/log/exported/examplefile
USER node
VOLUME ["/var/log/exported"]

thanks to @jsclarridge who found the snippet above in the AWS documentation. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bind-mounts.html

ghost commented 1 year ago

We have a requirement where we want to scan files for viruses with ClamAV. But the files may contain sensitive data and cannot be written on disk. We would like to use Fargate, but since it still not allows tmpfs we currently see no easy solution. Would be great to have this in the near future.

kftsehk commented 1 year ago

Sharing an experimental workaround here, I am also consulting support on this, our use case is server-side render cache that we would like to be fast and also terminates with container's lifecycle.

TLDR

Summary:

Workaround

At docker build time Dockerfile, create links but do not actually write files, build time shm is not runtime shm, data written to build time shm are lost as not being saved in container layers

RUN ln -s /dev/shm/your/required/folder/structure <your-desired-tmp-dir>

At run-time docker-entrypoint.sh, copy files from read-only filesystem into /dev/shm desired subdirectory before running app

mkdir -p /dev/shm/your/required/folder/structure
cp $HOME/your/file /dev/shm/your/required/folder/structure/ -r

Beware this is not a persistence storage, all these tmp files are gone when container is stopped, no recovery possible.

Details

Dockerfile

FROM public.ecr.aws/docker/library/node:16-alpine
# Can use 
# - /tmp: 30GB overlay
# - /dev/shm: all available memory to container
ENV TMPDIR=/tmp
WORKDIR /root

RUN ln -s ${TMPDIR} ./linked-tmp
RUN apk add --no-cache bash

COPY ./docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]

docker-entrypoint.sh

#!/bin/bash
set -euo pipefail

export FARGATE_TASK_ROOT=/root
export TMPDIR=/tmp
cd $FARGATE_TASK_ROOT || exit 1
pwd

df -h

ls -la ./
touch ./linked-tmp/asd
ls -l "./linked-tmp"
ls -l $TMPDIR

for i in {1..20000}; do
    dd if=/dev/urandom of=./linked-tmp/$i.dat bs=10M count=10
    df -h $TMPDIR
done

df -h shows 2 file system, Size of /dev/shm can show larger than the task/container specification, however possibly cgroup limit block write beyond what is allocated to the container.

# sample output on 4vcpu 8GB config
Filesystem                Size      Used Available Use% Mounted on
overlay                  29.4G      9.6G     18.3G  34% /
tmpfs                    64.0M         0     64.0M   0% /dev
shm                      14.9G         0     14.9G   0% /dev/shm
....

Our target is to try to use overlay writable at /tmp and memdisk at /dev/shm, and are indeed writable through build-time created softlink

# testing at /dev/shm, written 100MB files utill 7.9GB and get killed, expected, cannot write beyond what I didn't paid for (8GB Memory in both task & service definition).

10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G    100.0M     14.8G   1% /dev/shm
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G    200.0M     14.7G   1% /dev/shm
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G    300.0M     14.6G   2% /dev/shm
...
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G      7.7G      7.2G  52% /dev/shm
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G      7.8G      7.1G  52% /dev/shm
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
shm                      14.9G      7.9G      7.0G  53% /dev/shm
/usr/local/bin/docker-entrypoint.sh: line 16:   173 Killed                  dd if=/dev/urandom of=./linked-tmp/$i.dat bs=10M count=10 
# testing at /tmp, written 100MB files all the way up to 30GB until run out of disk space

10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
overlay                  29.4G      9.6G     18.2G  34% /
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
overlay                  29.4G      9.7G     18.2G  35% /
...
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
overlay                  29.4G     29.2G         0 100% /
10+0 records in
10+0 records out
Filesystem                Size      Used Available Use% Mounted on
overlay                  29.4G     29.3G         0 100% /
dd: error writing './linked-tmp/204.dat': No space left on device
helloworld121 commented 1 year ago

Hi @rickknowles-cognitant , Hi @kftsehk , I am also looking for tmpfs support. I also found a part in the docs on supported ecs task-definition parameter: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_linuxparameters Here tmpfs is listed as a supported parameter. Isn't this the topic you are looking for? Or am I misleaded? Looking forward to your response

kftsehk commented 1 year ago

@helloworld121

Please see the Note in bold under details of the use of tmpfs parameter

If you're using tasks that use the Fargate launch type, the tmpfs parameter isn't supported.

It is not supported for Fargate launch type, which is exactly what is being requested in this issue.

Sharing an experimental workaround here, I am also consulting support on this, our use case is server-side render cache that we would like to be fast and also terminates with container's lifecycle.

TLDR

Summary:

  • There is writable overlay at /tmp and shared memory at /dev/shm,
  • we don't have a config that mount these to desired directory
  • read-only filesystem also prevent us to ln at runtime
  • Workaround: we can ln -s dead softlinks at build-time into the container

Further analysis shows it is the mounting of a tmpfs device to custom path from task definition that is not supported, as tested we can have a soft-link hardcoded into container that point to default tmpfs /tmp or /dev/shm for in memory fs.