MaverickPeter / MR_SLAM

[IEEE T-RO 2023] A modularized multi-robot SLAM system with elevation mapping and a costmap converter for easy navigation. Different odometry and loop closure algorithms can be easily integrated into the system.
MIT License
281 stars 20 forks source link

docker #12

Closed diamonazreal closed 7 months ago

diamonazreal commented 1 year ago

Hello, first of all, thank you and your team for sharing, but when I used docker verification algorithm, I found that there was no module containing RING++. Could you please provide the image or construction method containing RING++? Thank you again for your sharing

MaverickPeter commented 1 year ago

Hi @diamonazreal, I am sorry about the missing RING++ module in the image in dockerhub. I'll update this image ASAP.

diamonazreal commented 1 year ago

Thanks

diamonazreal commented 1 year ago

Meanwhile, I have the following error when using the docker image you provided:

**_[ERROR] [1697637462.094589, 1648732982.415934]: bad callback: <function callback3 at 0x7fd3517eb040>
Traceback (most recent call last):
  File "/opt/ros/noetic/lib/python3/dist-packages/rospy/topics.py", line 750, in _invoke_callback
    cb(msg)
  File "main.py", line 361, in callback3
    pc_bev, pc_RING, pc_TIRING, _ = generate_RING(pc_normalized)
  File "/home/LoopDetection/src/RING_ros/util.py", line 296, in generate_RING
    pc_RING_normalized = fn.normalize(pc_RING, mean=pc_RING.mean(), std=pc_RING.std())
RuntimeError: CUDA error: no kernel image is available for execution on the device

**[ERROR] [1697637462.360664, 1648732982.677983]: bad callback: <function callback1 at 0x7fd3517e5ee0>
Traceback (most recent call last):
  File "/opt/ros/noetic/lib/python3/dist-packages/rospy/topics.py", line 750, in _invoke_callback
    cb(msg)
  File "main.py", line 262, in callback1
    pc_bev, pc_RING, pc_TIRING, _ = generate_RING(pc_normalized)
  File "/home/LoopDetection/src/RING_ros/util.py", line 296, in generate_RING
    pc_RING_normalized = fn.normalize(pc_RING, mean=pc_RING.mean(), std=pc_RING.std())
RuntimeError: CUDA error: no kernel image is available for execution on the device_**

Can you provide some suggestions to solve it? Thank you again!

MaverickPeter commented 1 year ago

@diamonazreal ,this issue is related to the mismatched version of the GPU driver and the CUDA, can you provide detailed information about your GPU driver and the GPU version. Also, you might need to use the run.bash scripts in docker to start a container that links the host GPU to the container.

diamonazreal commented 1 year ago

My GPU is GTX3060, and the driver versions using 515 and 470 respectively have the above problems. docker's container is built using run.sh, and the image is MaverickPeter/MR SLAM

MaverickPeter commented 1 year ago

@diamonazreal, I've updated the docker image in dockerhub with the latest update of this repo. Would you kindly check to see whether the CUDA problem persists with the new version?

diamonazreal commented 1 year ago

Hello, I use your newly provided image for Quick Demo test, but the above problems still exist @MaverickPeter

MaverickPeter commented 1 year ago

Based on this answer, I suggest you to force reinstall the PyTorch by running 'pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113'

diamonazreal commented 1 year ago

@MaverickPeter Hello, I executed the instruction in docker according to your suggestion, but this error still exists. At the same time, I tested other CUDA test code found on the Internet, and did not have the above problems

MaverickPeter commented 1 year ago

@MaverickPeter Hello, I executed the instruction in docker according to your suggestion, but this error still exists. At the same time, I tested other CUDA test code found on the Internet, and did not have the above problems

@diamonazreal It seems the problem is caused by the torch? Have you tried to upgrade the torch version to 1.12.1, you may use the command below: 'pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113'

diamonazreal commented 1 year ago

@MaverickPeter Hello, I executed the instruction in docker according to your suggestion, but this error still exists. At the same time, I tested other CUDA test code found on the Internet, and did not have the above problems

@diamonazreal It seems the problem is caused by the torch? Have you tried to upgrade the torch version to 1.12.1, you may use the command below: 'pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113'

I tried using a 2060GPU and replicated it successfully. Can't this algorithm run on a 30-series GPU?

MaverickPeter commented 1 year ago

It might be some compatible problem with cuda and pytorch. I'll try to replicate this issue on a 30-series GPU and find a solution.

MaverickPeter commented 1 year ago

@diamonazreal, I've reproduced this issue on the 30-series GPU. The problem is caused by the open-sourced radon transform implementation, which might have a compatible problem. I'll dig into it and try to fix it.

diamonazreal commented 1 year ago

@MaverickPeter Thank you for trying

MaverickPeter commented 1 year ago

@diamonazreal I downgraded the environment in the docker image to cuda 11.1.1 with pytorch 1.10.1 and everything went well. I've already updated the docker image and you can now download it via docker pull.

Joosoo1 commented 1 year ago

@MaverickPeter Hi, I'm experiencing the same issue with the latest docker image, my GPU is 4060 and the driver version is 535.129.03 ! 1 Do you have some suggestions to fix this issue?

Joosoo1 commented 1 year ago

When I run it locally, torch1.12.1+cu113 works fine, but when I use the latest docker image, I encounter the same error, and when I change the docker image to torch1.12.1+cu113, it's still the same error

MaverickPeter commented 1 year ago

@Joosoo1 You can try to upgrade the cuda to 12.x and utilize the corresponding torch version. The compute capability of 4060 is 8.9 (sm_89), and CUDA 11.x only supports <= sm_86. But I have no idea why the code works fine locally.

Joosoo1 commented 1 year ago

Good news!When I docker pull the image directly, RING++ works fine, while I build the image from dockerfile and run RING++ encountering the above error.

MaverickPeter commented 1 year ago

Cool, There are several minor changes when I build the docker image from the dockerfile, I'll check it later.

1466758326 commented 11 months ago

Traceback (most recent call last): File "main.py", line 10, in import gputransform ModuleNotFoundError: No module named 'gputransform'