erturklab / delivr_cfos

End-to-end light-sheet image analysis for mouse brains, with DL training data generated in VR
https://www.discotechnologies.org/DELiVR/
MIT License
27 stars 3 forks source link

How can I run DELiVR container on Docker #1

Closed PedjaJJ closed 4 months ago

PedjaJJ commented 1 year ago

The issue I have been having is that I downloaded the container delivr.12 on Docker and tried to run it, I opened up the container's log and found this message: "WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ ," but I have already installed the latest version of the NVIDIA Container Toolkit, which, according to NVIDIA, is supposed to install the NVIDIA driver as well. I also don't know how to start the delivr.12 container using the NVIDIA Container Toolkit.

The second problem I'm faced with is that the following shows up after I run the delivr.12 container as a plugin on another app, specifically the software Fiji (ImageJ), after which the plugin stops working.

I

"Resource is injar:file:/C:/Users/Desktop/Fiji.app/plugins/delivr-gui-1.0.0-bioarxiv-ready.jar!/config.json Loaded following JSON: { "raw_location" : "/data/raw/", "output_location" : "/data/output/", "mask_detection" : { "ilastik_location" : "/delivr/ilastik/", "ilastik_model" : "./models/random_forest_weights.ilp", "teraconverter_location" : "/delivr/teraconverter/", "output_location" : "/data/output/01_mask_detection/output/", "downsample_steps" : { "original_um_x" : 1.62, "original_um_y" : 1.62, "original_um_z" : 6.0, "downsample_um_x" : 25.0, "downsample_um_y" : 25.0, "downsample_um_z" : 25.0 } }, "blob_detection" : { "input_location" : "/data/output/01_mask_detection/output/", "model_location" : "./models/inference_weights.tar", "crop_size" : [ 64, 64, 32 ], "sw_batch_size" : 42, "output_location" : "/data/output/02_blob_detection/output/" }, "postprocessing" : { "input_location" : "/data/output/02_blob_detection/output/", "output_location" : "/data/output/03_postprocessing/output/" }, "atlas_alignment" : { "input_location" : "/data/output/03_postprocessing/output/", "output_location" : "/data/output/04_atlas_alignment/output/", "mBrainAligner_location" : "/delivr/mbrainaligner/", "collection_folder" : "/data/output/04_atlas_alignment/collection/", "parallel_processing" : "True" }, "region_assignment" : { "input_location" : "/data/output/04_atlas_alignment/collection/", "CCF3_atlasfile" : "./models/CCF3_P56_annotation.tif", "CCF3_ontology" : "./models/AllenMouseCCFv3_ontology_22Feb2021.xml", "output_location" : "/data/output/05_region_assignment/" }, "visualization" : { "input_csv_location" : "/data/output/04_atlas_alignment/output/", "input_size_location" : "/data/output/03_postprocessing/output/", "input_prediction_location" : "/data/output/02_blob_detection/output/", "cache_location" : "/data/output/06_visualization/cache/", "output_location" : "/data/output/06_visualization/output/" }, "FLAGS" : { "ABSPATHS" : false, "TEST_TIME_AUGMENTATION" : true, "MASK_DOWNSAMPLE" : true, "BLOB_DETECTION" : true, "POSTPROCESSING" : true, "ATLAS_ALIGNMENT" : true, "REGION_ASSIGNMENT" : true, "VISUALIZATION" : true, "SAVE_MASK_OUTPUT" : true, "SAVE_NETWORK_OUTPUT" : true, "SAVE_POSTPROCESSING_OUTPUT" : true, "SAVE_ATLAS_OUTPUT" : true } } C:\Users\Desktop\New folder/config.json Running powershell.exe docker run --rm -i --gpus all -v C:\Users\Desktop\Pedja\TBXCNO\Brain6\C1/:/data/raw/ -v C:\Users\Desktop\New folder/:/data/output delivr:12 python3 main.py /data/output/config.json C:\Users\Desktop\New folder C:\Users\Desktop\New folder true ERROR ::::::docker: invalid reference format. ERROR ::::::See 'docker run --help'. Done! REGION PATH: C:\Users\Desktop\New folder/05_region_assignment/ Opening Heatmaps..."

neuronflow commented 1 year ago

Thanks for your interest in our work. Can you post the output of the following?

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
PedjaJJ commented 1 year ago

docker: Error response from daemon: unknown or invalid runtime name: nvidia.

neuronflow commented 1 year ago

The logs you posted indicate you are using a Windows machine?

I am not too experienced with Windows and I don't have a testing environment for it currently.

Let's see what @ramialmask and @MoritzNegwer will have to say.

This sounds worrying: https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#is-microsoft-windows-supported

Anyway, what happens if you try the following without the runtime?

docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

and also this one?

docker run --rm --gpus=all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

My current aim is to make sure that your docker installation is configured properly.

PedjaJJ commented 1 year ago

Yes, I am using Windows 11.

After trying the first command, I received the following:

+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 531.14 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GT 1030 On | 00000000:01:00.0 On | N/A | | 35% 35C P8 N/A / 30W| 789MiB / 2048MiB | 1% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 26 G /Xwayland N/A | +---------------------------------------------------------------------------------------+

The second command resulted in the following output:

+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 531.14 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GT 1030 On | 00000000:01:00.0 On | N/A | | 35% 37C P0 N/A / 30W| 794MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 26 G /Xwayland N/A | +---------------------------------------------------------------------------------------+

I also get the following message from this command: docker run hello-world

Hello from Docker! This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
  3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal.

To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/

For more examples and ideas, visit: https://docs.docker.com/get-started/

PedjaJJ commented 1 year ago

Based on the resource you provided me with, it seems that the Nvidia container toolkit does not support Windows and might need to be run on ubuntu instead(?), so my efforts in searching for a solution for Windows might be futile.

MoritzNegwer commented 1 year ago

Hi PedjaJJ, I too think that nvidia-docker toolkit is indeed Linux only. On windows you won't need nvidia docker toolkit, docker desktop + updated nvidia drivers should work, we tested with drivers down to 528.4 (older than yours, so that's not the issue).

The first error message in the beginning was likely because the docker container was started without the --gpus=all flag. The nvidia-smi output you posted above indicates that in theory your docker setup can see and communicate with the graphics card. This is important because DELiVR runs the inference on the GPU, and requires as much graphics memory (and CUDA cores) as possible.

Unfortunately, going from the output of the nvidia-smi command you posted above, it looks as if your PC is equipped with an Nvidia GT 1030 with 2 GB VRAM. This is much lower than our recommendations for hardware (see handbook section 2.1 "Hardware requirements"), we tested it to work with 2600+ CUDA cores and 8GB VRAM and above (RTX 2070 Super or higher). Your PC's GT 1030 GPU has 384 CUDA cores and 2GB VRAM, which means that even if the card can run DELiVR, it will be very slow (think several weeks for the example dataset). If you have access to a different Nvidia graphics card (e.g. a shared workstation, HPC cluster nodes, or even a mid-range gaming PC), we strongly recommend that you try to get the container to run there. In theory you could also rent CUDA-enabled cloud nodes from e.g. AWS, however those are hard to get at the moment (at least in our availability zone, EU-Central) and can turn expensive fast.

That said, we do have some good news: At least the second issue has been rectified, so you should at least be able to test-run the container on your machine if you so desire. The second error was due to the FIJI plugin not correctly catching spaces in the folder paths under windows. We have updated the plugin (please see here for the dropbox link) as well as the docker container (please see here:) and here for an overview. Thanks for pointing this out, we hope that the fixed plugin will help you run DELiVR in the future :)

neuronflow commented 1 year ago

Probably we should improve hardware requirements in the documentation?

MoritzNegwer commented 1 year ago

@neuronflow, agreed. Documentation in the handbook updated, will be posted online shortly.

PedjaJJ commented 1 year ago

Unfortunately, Fiji gave me the same message after doing a test-run of the container on ubuntu with docker and Nvidia CUDA having been installed on the OS:

Resource is injar:file:/home/Fiji.app/plugins/delivr-gui-1.0.4-bioarxiv.jar!/config.json Loaded following JSON: { "raw_location" : "/data/raw/", "output_location" : "/data/output/", "mask_detection" : { "ilastik_location" : "/delivr/ilastik/", "ilastik_model" : "./models/random_forest_weights.ilp", "teraconverter_location" : "/delivr/teraconverter/", "output_location" : "/data/output/01_mask_detection/output/", "downsample_steps" : { "original_um_x" : 1.62, "original_um_y" : 1.62, "original_um_z" : 6.0, "downsample_um_x" : 25.0, "downsample_um_y" : 25.0, "downsample_um_z" : 25.0 } }, "blob_detection" : { "input_location" : "/data/output/01_mask_detection/output/", "model_location" : "./models/inference_weights.tar", "crop_size" : [ 64, 64, 32 ], "sw_batch_size" : 42, "output_location" : "/data/output/02_blob_detection/output/" }, "postprocessing" : { "input_location" : "/data/output/02_blob_detection/output/", "output_location" : "/data/output/03_postprocessing/output/", "min_size" : -1, "max_size" : -1 }, "atlas_alignment" : { "input_location" : "/data/output/03_postprocessing/output/", "output_location" : "/data/output/04_atlas_alignment/output/", "mBrainAligner_location" : "/delivr/mbrainaligner/", "collection_folder" : "/data/output/04_atlas_alignment/collection/", "parallel_processing" : "True" }, "region_assignment" : { "input_location" : "/data/output/04_atlas_alignment/collection/", "CCF3_atlasfile" : "./models/CCF3_P56_annotation.tif", "CCF3_ontology" : "./models/AllenMouseCCFv3_ontology_22Feb2021.xml", "output_location" : "/data/output/05_region_assignment/" }, "visualization" : { "input_csv_location" : "/data/output/04_atlas_alignment/output/", "input_size_location" : "/data/output/03_postprocessing/output/", "input_prediction_location" : "/data/output/02_blob_detection/output/", "cache_location" : "/data/output/06_visualization/cache/", "output_location" : "/data/output/06_visualization/output/" }, "FLAGS" : { "ABSPATHS" : false, "TEST_TIME_AUGMENTATION" : true, "MASK_DOWNSAMPLE" : true, "BLOB_DETECTION" : true, "POSTPROCESSING" : true, "ATLAS_ALIGNMENT" : true, "REGION_ASSIGNMENT" : true, "VISUALIZATION" : true, "SAVE_MASK_OUTPUT" : true, "SAVE_NETWORK_OUTPUT" : true, "SAVE_POSTPROCESSING_OUTPUT" : true, "SAVE_ATLAS_OUTPUT" : true } } /home/Desktop/New Folder/config.json Running docker run --rm -i --runtime=nvidia -v /home/Desktop/Pedja test brain/:/data/raw/ -v /home/Desktop/New Folder/:/data/output delivr:12 python3 main.py /data/output/config.json /home/Desktop/New Folder /home/Desktop/New Folder true ERROR ::::::docker: Cannot connect to the Docker daemon at unix:///home/.docker/desktop/docker.sock. Is the docker daemon running?. ERROR ::::::See 'docker run --help'. Done! REGION PATH: /home/Desktop/New Folder/05_region_assignment/ Opening Heatmaps...

I'm not sure why Docker daemon is apparently continuing to have issues, especially after receiving confirmation that Docker has been properly installed and daemon was running.

MoritzNegwer commented 1 year ago

Hi PedjaJJ,

Thanks for testing again. Just to check, you have re-downloaded the plugin (version 1.04) and container from here prior to install? We have been adding bugfixes throghout the day.

Judging from your error message, it looks as if the docker container is not starting, likely because of Linux-specific account permission issues. Can you please go through the following steps and let us know the results?

Step 1: Just to make sure that the docker daemon is correctly initialized: please run sudo systemctl start docker (requires admin privileges).

Step 2: On Linux, normal users are not allowed to run docker commands, this requires administrator rights or being in a specific group. Can you please run docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi and post the output?

Step 3: If you get a "cannot contact docker" error message, retry with sudo docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi. If that one works, your account currently does not have sufficient privileges to run docker commands. Please follow the steps outlined here. Don't forget to log out and in again (or just reboot) at the end.

Step 4: Re-test with docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi. (without sudo). If it works this time, you should now have sufficient user rights - feel free to run the FIJI plugin and let us know what you find.

PedjaJJ commented 1 year ago

Thanks MoritzNegwer. I tried step 1 but there was no output.

I then proceeded to step 2 and received the following output: docker: Error response from daemon: unknown or invalid runtime name: nvidia. See 'docker run --help'.

I moved on to step 3, which outputted information about the Nvidia driver: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 35% 39C P0 N/A / 30W | 408MiB / 1992MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

I then followed the instructions to the link you provided and was able to add my user to the docker group. However, when I did step 4, I received the following message: docker: Error response from daemon: unknown or invalid runtime name: nvidia. See 'docker run --help'.

I also tried to run the docker FIJI plugin and received the same message that I received on Windows: Resource is injar:file:/home/Fiji.app/plugins/delivr-gui-1.0.4-bioarxiv.jar!/config.json Loaded following JSON: { "raw_location" : "/data/raw/", "output_location" : "/data/output/", "mask_detection" : { "ilastik_location" : "/delivr/ilastik/", "ilastik_model" : "./models/random_forest_weights.ilp", "teraconverter_location" : "/delivr/teraconverter/", "output_location" : "/data/output/01_mask_detection/output/", "downsample_steps" : { "original_um_x" : 1.62, "original_um_y" : 1.62, "original_um_z" : 6.0, "downsample_um_x" : 25.0, "downsample_um_y" : 25.0, "downsample_um_z" : 25.0 } }, "blob_detection" : { "input_location" : "/data/output/01_mask_detection/output/", "model_location" : "./models/inference_weights.tar", "crop_size" : [ 64, 64, 32 ], "sw_batch_size" : 42, "output_location" : "/data/output/02_blob_detection/output/" }, "postprocessing" : { "input_location" : "/data/output/02_blob_detection/output/", "output_location" : "/data/output/03_postprocessing/output/", "min_size" : -1, "max_size" : -1 }, "atlas_alignment" : { "input_location" : "/data/output/03_postprocessing/output/", "output_location" : "/data/output/04_atlas_alignment/output/", "mBrainAligner_location" : "/delivr/mbrainaligner/", "collection_folder" : "/data/output/04_atlas_alignment/collection/", "parallel_processing" : "True" }, "region_assignment" : { "input_location" : "/data/output/04_atlas_alignment/collection/", "CCF3_atlasfile" : "./models/CCF3_P56_annotation.tif", "CCF3_ontology" : "./models/AllenMouseCCFv3_ontology_22Feb2021.xml", "output_location" : "/data/output/05_region_assignment/" }, "visualization" : { "input_csv_location" : "/data/output/04_atlas_alignment/output/", "input_size_location" : "/data/output/03_postprocessing/output/", "input_prediction_location" : "/data/output/02_blob_detection/output/", "cache_location" : "/data/output/06_visualization/cache/", "output_location" : "/data/output/06_visualization/output/" }, "FLAGS" : { "ABSPATHS" : false, "TEST_TIME_AUGMENTATION" : true, "MASK_DOWNSAMPLE" : true, "BLOB_DETECTION" : true, "POSTPROCESSING" : true, "ATLAS_ALIGNMENT" : true, "REGION_ASSIGNMENT" : true, "VISUALIZATION" : true, "SAVE_MASK_OUTPUT" : true, "SAVE_NETWORK_OUTPUT" : true, "SAVE_POSTPROCESSING_OUTPUT" : true, "SAVE_ATLAS_OUTPUT" : true } } /home/Desktop/New Folder/config.json Running docker run --rm -i --runtime=nvidia -v /home/Desktop/Pedja test brain/:/data/raw/ -v /home/Desktop/New Folder/:/data/output delivr:12 python3 main.py /data/output/config.json /home/Desktop/New Folder /home/Desktop/New Folder true ERROR ::::::docker: Error response from daemon: unknown or invalid runtime name: nvidia. ERROR ::::::See 'docker run --help'. Done! REGION PATH: /home/Desktop/New Folder/05_region_assignment/ Opening Heatmaps...

MoritzNegwer commented 1 year ago

Hi PedjaJJ,

Thanks for getting back to us. We have updated our FIJI plugin and docker container over the weekend with bugfixes and improvements, please re-download from the links at discotechnologies.org and re-install.

However, based on your pasted output it seems that you can only run dockers (at least with the nvidia runtime) as sudo, which implies that somehow your normal user doesn't have the rights to do so yet.

In order to disentangle between an nvidia runtime issue and a docker rights issue, can you please do the following?

  1. Please follow the instructions outlined here, if you haven't done so already.

  2. Make sure you have rebooted at the end, restart docker with sudo systemctl start docker, and try whether you can run docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi without issue.

  3. In order to rule out user-rights issues, can you please run docker run --rm hello-world and let us know whether this works?

If 3) works but 2) doesn't, you have the appropriate user rights to run a docker container, but not with the nvidia runtime.

If on the other hand both don't work, you'll need to give your user the appropriate rights to do so. As a workaround, you can copy the "docker run ..." command from the FIJI plugin's log output (relatively at the beginning) and run it with "sudo " in front as a proof of concept.

A note of caution - even if it works, expect processing to be very slow. DELiVR requires lots of graphics card memory and the lowest we have tested with (and can confirm working) is 8 GB - still 4x the amount of your graphics card. In order to keep things manageable to weeks as opposed to months on your card, I'd recommend disabling "test-time augmentation" in the FIJI plugin. This will produce a somewhat less refined output, but will save you >90% of the GPU time.

I hope this helps. Please feel free to check back in if you encounter any issues.

Tas-V commented 4 months ago

Hi PedjaJJ,

Thanks for testing again. Just to check, you have re-downloaded the plugin (version 1.04) and container from here prior to install? We have been adding bugfixes throghout the day.

Judging from your error message, it looks as if the docker container is not starting, likely because of Linux-specific account permission issues. Can you please go through the following steps and let us know the results?

Step 1: Just to make sure that the docker daemon is correctly initialized: please run sudo systemctl start docker (requires admin privileges).

Step 2: On Linux, normal users are not allowed to run docker commands, this requires administrator rights or being in a specific group. Can you please run docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi and post the output?

Step 3: If you get a "cannot contact docker" error message, retry with sudo docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi. If that one works, your account currently does not have sufficient privileges to run docker commands. Please follow the steps outlined here. Don't forget to log out and in again (or just reboot) at the end.

Step 4: Re-test with docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi. (without sudo). If it works this time, you should now have sufficient user rights - feel free to run the FIJI plugin and let us know what you find.

I am facing a similar issue right now on Ubuntu 22.04. after running docker run --rm --runtime=nvidia nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi, I get the following error:

Cannot connect to the Docker daemon at unix:///home/taz/.docker/desktop/docker.sock. Is the docker daemon running?.

I have added myself to the docker group though and have restarted my PC to test it and face the same issue. Please let me know if you are aware of a solution.

neuronflow commented 4 months ago

I opened a new issue for you @Tas-V , see: #4