CVMI-Lab / PAConv

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Apache License 2.0
287 stars 40 forks source link

loss.backward!! #12

Closed swzaaaaaaa closed 3 years ago

swzaaaaaaa commented 3 years ago

请问 有人遇到过loss.backward卡住的情况吗,调试到这个位置,就一直不动了!!!

mutianxu commented 3 years ago

Could you plz provide the screenshot or log, so that I may help you to check where the program is stuck?

swzaaaaaaa commented 3 years ago

请问,你程序有日志功能吗,应该是调用cuda的那部分代码,不知道怎么看日志!!

swzaaaaaaa commented 3 years ago

Could you plz provide the screenshot or log, so that I may help you to check where the program is stuck?

分类任务,对于pointnet可以正常运行,但是对于dgcnn就会卡住,ctrl+c都结束不了,只能kill。

mutianxu commented 3 years ago

Ok, I think the problem is caused by the limited amounts of paralleled kernels of your gpu. Coud you plz tell me the hardware you run our code?

mutianxu commented 3 years ago

Also, we have the log, which is saved in the checkpoint, and will be generated if the program starts running.

swzaaaaaaa commented 3 years ago

我的是ubuntu16.04,1个GPU1080ti,显存11G,目前cuda是10.1,pytorch1.5 python=3.7

swzaaaaaaa commented 3 years ago

Also, we have the log, which is saved in the checkpoint, and will be generated if the program starts running. 请问是这个文件吗?events.out.tfevents

mutianxu commented 3 years ago

Ok I got it, since our cuda_kernel runs in a multi-thread parallel strategy. Only one 1080Ti may not be able to serve for the need of threads of DGCNN_GPC, but can be serve for pointnet. (since our cuda kernel parallels different channels, dgcnn needs more threads than pointnet). Plz also note that there is no relationship between gpu memory and gpu threads. Thus it is very normal that the memory is OK but the program is stuck. So far I may recommend you to try to run on 2 1080Ti. Also I can make sure that our code is correct since I have received many emails from different people who have successfully re-produced both DGCNN_GPC and PointNet_GPC.

As for the logs, events.out.tfevents is for you to check the loss on tensorboard, which you may found the instructions in our README. The log will be saved according to this.

swzaaaaaaa commented 3 years ago

Ok I got it, since our cuda_kernel runs in a multi-thread parallel strategy. Only one 1080Ti may not be able to serve for the need of threads of DGCNN_GPC, but can be serve for pointnet. (since our cuda kernel parallels different channels, dgcnn needs more threads than pointnet). Plz also note that there is no relationship between gpu memory and gpu threads. Thus it is very normal that the memory is OK but the program is stuck. So far I may recommend you to try to run on 2 1080Ti. Also I can make sure that our code is correct since I have received many emails from different people who have successfully re-produced both DGCNN_GPC and PointNet_GPC.

As for the logs, events.out.tfevents is for you to check the loss on tensorboard, which you may found the instructions in our README. The log will be saved according to this. 请问,能不能在程序中,减少一下线程的数量要求?那部分代码的位置请问在哪里?

swzaaaaaaa commented 3 years ago

Ok I got it, since our cuda_kernel runs in a multi-thread parallel strategy. Only one 1080Ti may not be able to serve for the need of threads of DGCNN_GPC, but can be serve for pointnet. (since our cuda kernel parallels different channels, dgcnn needs more threads than pointnet). Plz also note that there is no relationship between gpu memory and gpu threads. Thus it is very normal that the memory is OK but the program is stuck. So far I may recommend you to try to run on 2 1080Ti. Also I can make sure that our code is correct since I have received many emails from different people who have successfully re-produced both DGCNN_GPC and PointNet_GPC.

As for the logs, events.out.tfevents is for you to check the loss on tensorboard, which you may found the instructions in our README. The log will be saved according to this.

解决了,我把batch_size改小了就可以了,那么就是线程数量的问题,cuda对于线程要求卡在了那里!

swzaaaaaaa commented 3 years ago

Ok I got it, since our cuda_kernel runs in a multi-thread parallel strategy. Only one 1080Ti may not be able to serve for the need of threads of DGCNN_GPC, but can be serve for pointnet. (since our cuda kernel parallels different channels, dgcnn needs more threads than pointnet). Plz also note that there is no relationship between gpu memory and gpu threads. Thus it is very normal that the memory is OK but the program is stuck. So far I may recommend you to try to run on 2 1080Ti. Also I can make sure that our code is correct since I have received many emails from different people who have successfully re-produced both DGCNN_GPC and PointNet_GPC.

As for the logs, events.out.tfevents is for you to check the loss on tensorboard, which you may found the instructions in our README. The log will be saved according to this.

谢谢回复。

mutianxu commented 3 years ago

Great!

We have tried to decrease the number of threads here before. However, it does not work.

Note that the number of threads is decided by the number of channels, batch_size, the number of KNNs and other metrics showed in our cuda_lib.

I may still recommend you to try to run on more gpus, to avoid similar problem in part_seg or scene_seg task.

BTW, you are very welcome to submit the issues or pull requests if you find other ways to solve this problem.

Hope this is helpful!

swzaaaaaaa commented 3 years ago

感谢,如果有非cuda版本的代码,可以发出来让大家看看,因为cuda版本的不太熟悉(苦笑)。

mutianxu commented 3 years ago

感谢,如果有非cuda版本的代码,可以发出来让大家看看,因为cuda版本的不太熟悉(苦笑)。

We do not release the non-cuda version since it needs 14G memory. You can try to re-produce this by yourself following our paper.

swzaaaaaaa commented 3 years ago

感谢,如果有非cuda版本的代码,可以发出来让大家看看,因为cuda版本的不太熟悉(苦笑)。

We do not release the non-cuda version since it needs 14G memory. You can try to re-produce this by yourself following our paper.

好的,感谢。