Tune memory/cpu/disk allocated to docker

Simon-Harris-IBM commented 4 years ago

Need to tune cpu/memory/disk/gpu's allocated to run the submitted docker container.

Code in run_docker.py:

mummert-ibm commented 4 years ago

The current plan is to run only a single GPU enabled container at a time on the backend nodes. We will allow them to use all the GPUs, and should limit the CPU and RAM such that the infrastructure components (orchestrator et al) are not starved for resources (which are relatively minimal).

Tentatively: allowed RAM = total RAM - 8GB total disk = if we unlimit this, it will be bound by the filesystem on which docker's /var/lib/docker is found. Which may be fine... need to investigate how to clean up/ garbage collect after an image is removed.

Simon-Harris-IBM commented 4 years ago

Following settings tested by Adam - runtime in the table is based on model runtime directly on the m/c (ie: no synapse):

VM	Memory (G)	CPUs	Shared Memory (G)	Run time
Todd	32	7	16	6m54
Todd	32	7	12	6m50
Todd	32	7	8	6m36
Todd	48	7	12	6m41
Todd	32	6	12	8m15
Simon	32	7	12	7m1
Simon	32	7	12	6m50

Based on this, we've decided to go with the following settings:

shm_size=12G
mem_limit=32G

Attempts to limit cpu consumption to 87% have not worked using the arguments documented in docker-python api docs: https://docker-py.readthedocs.io/en/stable/containers.html. I'm not so concerned about this as CPU spikes to 100% only infrequently during a run.

With the above settings, inference using the example pytorch model takes approx 10m30s from submission to completion using Synapse.

Simon-Harris-IBM / ObjectNetChallenge-Workflows

Tune memory/cpu/disk allocated to docker #4