KOKIAOKI / 3d_bbs

MIT License
172 stars 28 forks source link

Support docker and add description #10

Closed Taeyoung96 closed 9 months ago

Taeyoung96 commented 9 months ago

@KOKIAOKI
As I mention at #9,

I make a Dockerfile with a detailed description!
I hope this PR is helpful.

Thanks,

KOKIAOKI commented 9 months ago

@Taeyoung96

Thank you for contributing to 3D-BBS! Thanks to this PR, user will easily test 3D-BBS without cuda version dependence. I'm now building docker images. Your kind introduction is very helpful also!

Taeyoung96 commented 9 months ago

@KOKIAOKI

Great!
Let me know if you run into any problems.

KOKIAOKI commented 9 months ago

Thank you!! Please let me ask you some questions!

  1. nvidia-docker I think nvidia-docker you attached in .md is deprecated now. I installed nvidia-container-toolkit refering to this https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html So, is it correct flow to tell user to install nvidia-container-toolkit?

  2. cuda driver version error 3d-bbs container is successfully run and built! However, when I run gpu_test, following error occured in cuda variable or function.

    root@koki-GKF1060GF:~/workspace/test/build# ./gpu_test ../config/test.yaml 
    [YAML] Loading paths...
    [YAML] Loading 3D-BBS parameters...
    [YAML] Loading angular search range...
    [YAML] Loading score threshold percentage...
    [YAML] Loading downsample souce clouds parameters...
    [Setting] Loading target pcds...
    [Setting] Loading source pcds...
    [Setting] Create output folder with date...
    [Voxel map] Creating hierarchical voxel map...
    warning: cudaErrorInsufficientDriver
       : CUDA driver version is insufficient for CUDA runtime version
    terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
    what():  std::bad_alloc: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
    Aborted (core dumped)

ubuntu22.04 nvidia-driver 535.129.03 (>525) is installed in my host PC. I try to run docker container in another PC, but can you guess what is affected to this error?

Taeyoung96 commented 9 months ago

Thank you!! Please let me ask you some questions!

  1. nvidia-docker I think nvidia-docker you attached in .md is deprecated now. I installed nvidia-container-toolkit refering to this https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html So, is it correct flow to tell user to install nvidia-container-toolkit?
  2. cuda driver version error 3d-bbs container is successfully run and built! However, when I run gpu_test, following error occured in cuda variable or function.
root@koki-GKF1060GF:~/workspace/test/build# ./gpu_test ../config/test.yaml 
[YAML] Loading paths...
[YAML] Loading 3D-BBS parameters...
[YAML] Loading angular search range...
[YAML] Loading score threshold percentage...
[YAML] Loading downsample souce clouds parameters...
[Setting] Loading target pcds...
[Setting] Loading source pcds...
[Setting] Create output folder with date...
[Voxel map] Creating hierarchical voxel map...
warning: cudaErrorInsufficientDriver
       : CUDA driver version is insufficient for CUDA runtime version
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
Aborted (core dumped)

ubuntu22.04 nvidia-driver 535.129.03 (>525) is installed in my host PC. I try to run docker container in another PC, but can you guess what is affected to this error?

@KOKIAOKI

Thanks for your reporting!

  1. That's right, the user need to install with nvidia-container-toolkit. I didn't see that update.

  2. Did you try restarting docker after installing nvidia-container-toolkit on local terminal?

    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
  3. What is the output when you run nvidia-smi and nvcc -V inside a docker container?
    Does the CUDA version output correctly as 12.0?

Or, my current guess is that the command might change with the change of nvidia docker to nvidia container toolkit.
Would you like to create a new container and try changing the container_run.sh file with the content below?

#!/bin/bash
# Author : Taeyoung Kim (https://github.com/Taeyoung96)

# Set the project directory (PROJECT_DIR) as the parent directory of the current working directory
PROJECT_DIR=$(dirname "$PWD")

# Move to the parent folder of the project directory
cd "$PROJECT_DIR"

# Print the current working directory to verify the change
echo "Current working directory: $PROJECT_DIR"

# Check if arguments are provided for the image name and tag
if [ "$#" -ne 2 ]; then
  echo "Usage: $0 <container_name> <image_name:tag>"
  exit 1
fi

# Assign the arguments to variables for clarity
CONTAINER_NAME="$1"
IMAGE_NAME="$2"

# Launch the nvidia-docker container with the provided image name and tag
docker run --privileged -it \
           --runtime=nvidia \
           --gpus all \
           -e NVIDIA_DRIVER_CAPABILITIES=all \
           -e NVIDIA_VISIBLE_DEVICES=all \
           --volume="$PROJECT_DIR:/root/workspace" \
           --volume=/tmp/.X11-unix:/tmp/.X11-unix:rw \
           --net=host \
           --ipc=host \
           --shm-size=4gb \
           --name="$CONTAINER_NAME" \
           --env="DISPLAY=$DISPLAY" \
           "$IMAGE_NAME" /bin/bash

On my PC, I found that changing to nvidia-container-toolkit worked fine!

Terminal output :

-------------------------------
[Localize] pcd file name: 5
[Localize] Execution time: 3519.88[msec] 
[Localize] Score: 1060
-------------------------------
[Localize] pcd file name: 6
[Localize] Execution time: 1028.25[msec] 
[Localize] Score: 1122
-------------------------------
[Localize] pcd file name: 7
[Localize] Execution time: 2377.63[msec] 
[Localize] Score: 984
-------------------------------
[Localize] pcd file name: 8
[Localize] Execution time: 312.308[msec] 
[Localize] Score: 752
-------------------------------
[Localize] pcd file name: 9
[Localize] Execution time: 2625.28[msec] 
[Localize] Score: 901
[Localize] Average time: 794[msec] per frame
root@multirobot1-CILAB:~/workspace/test/build# 
KOKIAOKI commented 9 months ago

Thank you for your advice! gpu_test with test_data worked successfully in docker container after changing container_run.sh as you showed!

(Yesterday, docker container may not have worked properly because nvidia-smi wasn't displayed in docker container.)

I would like you to make changes regarding nvidia-container-toolkit.

Taeyoung96 commented 9 months ago

@KOKIAOKI

I'm glad to hear that your docker environment is now working properly.

I've reflected all of your requests in a new commit, please check it out.

KOKIAOKI commented 9 months ago

@Taeyoung96 Thank you very much for PR of Docker support!

About the thread of issue, I'll keep open until the repoter responds.

In a month, I plan to add the CPU version and ROS2 online test. So, visit 3d_bbs again if you'd like!