cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

File management in docker #287

Closed LijieTu closed 9 years ago

LijieTu commented 9 years ago

Hello all,

I'm trying to run lua file in the docker but got confused about how it functions.

  1. How could I create a private dir in the docker so that I could store some codes in it? If I naively mkdir in the /root, the file will disappear after I logout and login again.
  2. As for the repo (#270) in https://registry.hub.docker.com/u/kaixhin/cuda-torch/ , do I have to install it in every node? Is it possible to have it globally over all nodes? Otherwise, I have to run the code in a fixed node, instead of picking up one randomly every time.

Thank you!

tatarsky commented 9 years ago

Please note I am not a docker expert. But some of the folks on the cluster may have some other ideas besides what I say here.

We would recommend for item number 1 you take a look at data volumes and host path access perhaps:

https://docs.docker.com/userguide/dockervolumes/

For item number 2 I'm a bit confused what you are asking. You can specify a path to the container and store the docker image out on GPFS. But please note by default the docker path to the runtime area (-g argument) is set to /scratch/docker just to prevent accidents. You may wish to reset that flag for your particular runs.

LijieTu commented 9 years ago

Sorry for the confusion. Let me try describing the situation this time.

In question 2, I mean if I request a gpu, say gpu-2-6 for the first time, then I run the command in the link: docker run -it --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidia0:/dev/nvidia0 kaixhin/cuda-torch It shows in the terminal:

Unable to find image 'kaixhin/cuda-torch' locally Pulling repository kaixhin/cuda-torch

and begins to download. After the download, the prompt shows ~/torch#, which means the container is ready.

Next time I request the same gpu-2-6 and run the docker with the same command, there is no need downloading again and I can jump to the ~/torch#.

However, if I request a different gpu, say gpu-2-11, then I have to go through the downloading again before the container is ready. My question is : is there any way out I could avoid downloading the repo when I use a different node from last time?

Hope I make it clear this time. Thanks!

tatarsky commented 9 years ago

You need to save the image I believe to GPFS and use the docker arguments to reference that image location instead of the repo. There are some examples of this in some past docker Git requests or I can Google around but I'm not really available today due to some other matters. Lets see if the other docker users chime in or I'll take a further look next week.

LijieTu commented 9 years ago

Sure. I will see if I could get it done somehow. Thanks.

tatarsky commented 9 years ago

I do show in the default location on numbers of nodes the kaixhin/cuda-torch image already exists.

Are you considering it a "bad thing" to just load it when needed on all nodes and then attach a GPFS based data volume for your persistent data?

There is plenty of room on the node /scratch areas for docker images....aka why are you trying to avoid download the image? Time?

LijieTu commented 9 years ago

I was thinking if I want to get something else in the future, maybe I do not have to download it every time, just to save time.

tatarsky commented 9 years ago

Yeah, I guess in theory the docker images repo could be shared on all the nodes and live in GPFS but I've never tried that and we'd have to communicate with the other docker users before doing it.

I don't know enough about docker repos to tell you whats going to happen if I set the repo to some shared GPFS dir however.

So for now I would just batch up on needed nodes a "docker pull kaixhin/cuda-torch" and perhaps another method will become evident.

Seems to only take a few minutes. Its certainly not a disk space issue.

if you want I can just execute that for you all all nodes.

tatarsky commented 9 years ago

Looks like you could also mess with the import/export features to skip the download. But I only glanced at a writeup on it. YMMV

http://tuhrig.de/difference-between-save-and-export-in-docker/

tatarsky commented 9 years ago

Or this one...

http://stackoverflow.com/questions/23935141/how-to-copy-docker-images-from-one-host-to-another-without-via-repository

tatarsky commented 9 years ago

I just performed a save/load method from that second URL example and it seemed to go ok. Its a big file but perhaps quicker than the repo download. Feel free to time it ;)

LijieTu commented 9 years ago

Many thanks!! I guess I need more time to get myself familiar with the file management system of docker.

tatarsky commented 9 years ago

Here's what I did. It seemed a bit faster and if you batched it up as a job probably even quicker: One node with the image already there via a fetch:

docker save -o dockerimages/torch.tar kaixhin/cuda-torch

For all nodes not with image visible via docker images (script left to reader)

docker load -i dockerimages/torch.tar
tatarsky commented 9 years ago

Also note I will look into shared filesystem docker repos but not today.

tatarsky commented 9 years ago

I am going to open a separate issue about docker image storage and generic docker use. I believe you have what you need for the current configuration. If not, please re-open