MatrixAI / Emergence

Distributed Infrastructure Orchestration
Apache License 2.0
1 stars 0 forks source link

GPU Containers #38

Open CMCDragonkai opened 5 years ago

CMCDragonkai commented 5 years ago

GPUs are quite a bit different as they often involve proprietary drivers and their integration into container ecosystem is still a bit bespoke. I've successfully ran a Tensorflow application involving CUDA and cuDNN inside a Docker container on a AWS virtual machine that has an NVIDIA GPU attached. It was quite confusing.

However there is some automation that NVIDIA is providing: https://devblogs.nvidia.com/gpu-containers-runtime/

It appears they created a hook into the runc system to provide some level of indirection when running containers that require access to GPUs. I have not yet tried this, but I think this adds some amount of real world complexity to stress test such a use case on the Matrix system.

This is low priority for now, and should be in the backlog.

CMCDragonkai commented 5 years ago

https://github.com/NVIDIA/libnvidia-container this is their hook into the runc.

Docker also has --runtime option, I want to explore what kinds of flexibility this affords us in Matrix AI.