Offering help for porting to GPU

PRBonn / rangenet_lib

Inference module for RangeNet++ (milioto2019iros, chen2019iros)

MIT License

309 stars 72 forks source link

Offering help for porting to GPU #21

Closed akouri-dd closed 4 years ago

akouri-dd commented 4 years ago

Hi,

Thank you very much for publishing this code and paper. I noticed that a lot of the preprocessing/projection work in netTensorRT is being done on the CPU. I think the total inference time could be dramatically reduced by porting many of these functions to CUDA kernels.

I would like to help with this; what's the best format for me to do so? It will break existing API. I can just commit the kernels I've written and then maybe someone from your team can have them to integrate?

Chen-Xieyuanli commented 4 years ago

Hey @akouri-dd, thank you very much for offering help.

You could create another branch and pull a request. We will later merge them into the master.

akouri-dd commented 4 years ago

ok, thank you. By the way, why do you need to sort the points in doProjection?

tano297 commented 4 years ago

Hi @akouri-dd,

Sorting in decreasing range ensures that the pixels always contain the closest point to the sensor for all points in the same pixel frustum. It works as a sort of z-buffer. It is definitely not the most efficient way to achieve this in C++, but in Python it was easier and faster to vectorize these operations vs doing a real Z-buffer, and they were basically copied one to one to C++. In C++ it is very likely faster to avoid sorting and just checking the range inside the iteration to see if it is smaller or greater than the one already stored in the range image. I hope this helps understanding.

akouri-dd commented 4 years ago

Thank you, yes this is the design design I made as well. Just checking that it won't affect anything down the line.