Question about cuda 11.0 and pyorch 1.7.0

alexklwong / calibrated-backprojection-network

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Other

117 stars 24 forks source link

Question about cuda 11.0 and pyorch 1.7.0 #7

Closed zhangguanghui1 closed 2 years ago

zhangguanghui1 commented 2 years ago

Thank you for your excellent work.

I would like to ask if you used cuda 11.0 when testing on ubuntu 20.04? When I trainthe network on cuda 11.0 + pytorch1.7 based on RTX 3090, the loss cannot drop normally. I cannot find the reason. Could you help me?

alexklwong commented 2 years ago

I've tried torch 1.2 and 1.3 for ubuntu 20.04. For that I think you would need cuda 10.1 or 10.2. I don't think cuda version is the issue so long as it is compatible with your torch version.

I also just ran it with torch 1.4 for a bit and it looks like it is working as well. Can you try lowering your torch version?

zhangguanghui1 commented 2 years ago

Thank you for your kind reply VERY MUCH!

I have tried cuda9.0+torch 1.3 for ubuntu 18.04 based on RTX 2080 Ti, it works well. When I tried cuda 11.0+torch 1.7 for ubuntu18.04 based on RTX 3090, it can run, but the loss rises（below figure). Since the RTX 3090 does not support cuda version ≤11.0, so I have to use torch ≥1.7.

loss突然上升

alexklwong commented 2 years ago

Ah yeah looks like it diverged between 4k and 6k steps. This is strange, there must have been some backwards incompatible change in 1.7. I can investigate this.

zhangguanghui1 commented 2 years ago

Okay, I will investigate the differences between the torch 1.7 and 1.3 to try to find the reason.

zhangguanghui1 commented 2 years ago

I have tried pytorch 1.7.0 + torchvision 0.8.0 or pytorch 1.8.0 on RTX 3090, it is okay. @alexklwong