ThibaultGROUEIX / ChamferDistancePytorch

Chamfer Distance in Pytorch with f-score
MIT License
326 stars 43 forks source link

error in nnd updateOutput: invalid device function #15

Closed wen-yuan-zhang closed 3 years ago

wen-yuan-zhang commented 3 years ago

I met a problem whlie trying to run unit_test.py.

If I do not compile maually, all tests can pass normally, and the output is below:

Jitting Chamfer 2D
Loaded JIT 2D CUDA chamfer distance
Jitting Chamfer 3D
Loaded JIT 3D CUDA chamfer distance
Jitting Chamfer 5D
Loaded JIT 5D CUDA chamfer distance
testing Chamfer 2D
fscore : (tensor([0.3527, 0.3912, 0.3908, 0.3707], device='cuda:0'), tensor([0.4500, 0.5300, 0.4900, 0.5300], device='cuda:0'), tensor([0.2900, 0.3100, 0.3250, 0.2850], device='cuda:0'))
Unit test passed
Timings : Start CUDA version
Ellapsed time forward backward is 0.1624271082878113 seconds.
Timings : Start Pythonic version
Ellapsed time  forward backward  is 0.026266088485717775 seconds.
testing Chamfer 3D
fscore : (tensor([0.0200, 0.0267, 0.0133, 0.0133], device='cuda:0'), tensor([0.0300, 0.0400, 0.0200, 0.0200], device='cuda:0'), tensor([0.0150, 0.0200, 0.0100, 0.0100], device='cuda:0'))
Unit test passed
Timings : Start CUDA version
Ellapsed time forward backward is 0.15791781902313232 seconds.
Timings : Start Pythonic version
Ellapsed time  forward backward  is 0.0261559534072876 seconds.
testing Chamfer 5D
fscore : (tensor([0., 0., 0., 0.], device='cuda:0'), tensor([0., 0., 0., 0.], device='cuda:0'), tensor([0., 0., 0., 0.], device='cuda:0'))
Unit test passed
Timings : Start CUDA version
Ellapsed time forward backward is 0.16233629941940309 seconds.
Timings : Start Pythonic version
Ellapsed time  forward backward  is 0.026321511268615722 seconds.

However, if I compile Chamfer3D maually(run python chamfer3D/setup.py install) and run unit_test.py, the test cannnot pass. The output is below:

Jitting Chamfer 2D
Loaded JIT 2D CUDA chamfer distance
Loaded compiled 3D CUDA chamfer distance
Jitting Chamfer 5D
Loaded JIT 5D CUDA chamfer distance
testing Chamfer 2D
fscore : (tensor([0.3377, 0.3523, 0.3442, 0.3389], device='cuda:0'), tensor([0.5000, 0.4900, 0.4600, 0.4700], device='cuda:0'), tensor([0.2550, 0.2750, 0.2750, 0.2650], device='cuda:0'))
Unit test passed
Timings : Start CUDA version
Ellapsed time forward backward is 0.16943942308425902 seconds.
Timings : Start Pythonic version
Ellapsed time  forward backward  is 0.026366772651672362 seconds.
testing Chamfer 3D
error in nnd updateOutput: invalid device function
error in nnd get grad: invalid device function
Traceback (most recent call last):
  File "unit_test.py", line 68, in <module>
    test_chamfer(cham, dims[i])
  File "unit_test.py", line 27, in test_chamfer
    ), "chamfer cuda and chamfer normal are not giving the same results"
AssertionError: chamfer cuda and chamfer normal are not giving the same results
Segmentation fault (core dumped)

I don't know how to solve this problem. It seems that something wrong happened when compiling extensions manually. I found there is another same issue in closed issues, but I checked my cuda and there is no probelm in cuda settings. python version: 3.6 pytorch version: 1.6.0 cuda version: 10.0 os: linux

ThibaultGROUEIX commented 3 years ago

Hi @zParquet, Strange issue indeed ! I don't know where this could come from. In the meantime, I suggest using the JIT version then, Best regards, Thibault

wen-yuan-zhang commented 3 years ago

I directly use the JIT version instead of compiling C++ extensions manually and it can work normally. Thanks a lot! @ThibaultGROUEIX