Cuda acceleration - Githubissues

OctoMap / octomap

An Efficient Probabilistic 3D Mapping Framework Based on Octrees. Contains the main OctoMap library, the viewer octovis, and dynamicEDT3D.

http://octomap.github.io

1.92k stars 657 forks source link

Cuda acceleration #112

Open andre-nguyen opened 8 years ago

andre-nguyen commented 8 years ago

Hi @ahornung

I saw issue #29 and wasn't interested in the GPU-voxel approach. It is clear that many ros applications use octomap as a standard and we would gain to work on parallelizing octomap. The advent of embedded GPU's such as the nvidia TK1 and TX1 are making this much more interesting for mobile robotics.

I would like to slowly incrementally develop this by speeding up small parts of the code.

How feasible do you think this is and do you have any pointers on where to start?

ahornung commented 8 years ago

Great to hear that you're interested in improving the performance! That sounds definitely feasible, and incrementally taking care of parts is probably the best way forward.

The critical functions would be computeUpdate(...) in OccupancyOcTreeBase and computeRayKeys(...) in OcTreeBaseImpl. You'll find that there are already conditional OpenMP parallelizations in place, these could give you some hints for a start.

andre-nguyen commented 8 years ago

Thanks, time for me to learn Cuda then :D

ahornung commented 8 years ago

Just in case you're generally looking for speedups and are not yet commited to Cuda: It's probably worth having a look at SIMD intrinsics (SSE) as well. These changes could be less intrusive than switching certain parts to Cuda.

andre-nguyen commented 8 years ago

Thanks for the tip and sorry for the late response. I unfortunately only recently received my hardware but SSE would certainly be interesting that way I could work from home without the need for the TK1.

Please don't count too much on this though, if it is ever ready, it will be for the end of the summer.

gsp-27 commented 8 years ago

Hi, Can you point me to some resources which can point me to understand octrees more intuitively. I understand segment trees and also familiar with lazy update in 1D segment trees. Octrees are 3 dimensional version of segment trees but it is difficult for me to imagine lazy update in it. I wanted to make contribution for it. I am writing this comment because I also plan on parallelising, if it is even possible. Your help will be of immense help.

ahornung commented 8 years ago

The best documentation will be Wikipedia, the OctoMap AuRo journal paper, and the code; with increasing depth into the topic.

dblanm commented 7 years ago

Hi @andre-nguyen ,

How is it going the implementation of CUDA with Octomap? I am also planning on implementing CUDA in Octomap. Maybe I could try to help you.

gsp-27 commented 7 years ago

If you guys plan some specific tasks I would also love to help.

On Mon, Jun 19, 2017, 12:20 PM David Mulero notifications@github.com wrote:

Hi @andre-nguyen https://github.com/andre-nguyen ,

How is it going the implementation of CUDA with Octomap? I am also planning on implementing CUDA in Octomap. Maybe I could try to help you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OctoMap/octomap/issues/112#issuecomment-309353381, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLApYYEbhaSbUt0pXc6JdDUf7DkEmieks5sFhoWgaJpZM4HuHJ8 .

andre-nguyen commented 7 years ago

@dblanm @gsp-27 Like many projects, other tasks got out of hand and I didn't have time to get to this :sob: :sob: :sob:

sbaktha commented 5 years ago

Hi, Is there any update on the status of CUDA implementation?

saifullah3396 commented 4 years ago

Hi @ahornung, I have developed a CUDA based replacement of the computeUpdate() and computeRayKeys(). Can you please look at my fork https://github.com/saifullah3396/octomap and tell me if its good for pull request. For now it does not have conflicts with the basic implementation? I'd really like further development on this to be done in this repository. The implementation can be tested by building the cuda-devel branch (add cmake parameter -D__CUDA_SUPPORT__=ON) and running graph2tree as follows: ../bin/graph2tree -i ../octomap/share/data/spherical_scan.graph -o out.bt I am still facing a few issues regarding speeding up the process. Right now a lot of data has to be copied to GPU before updating the scan. For that maybe its better to copy the tree once on GPU and then keep using it? or create the tree on GPU directly. In any case, copying the tree on GPU takes a lot of time.

ahornung commented 4 years ago

Thanks for your contribution @saifullah3396, that sounds really useful!

Do you have a first indication about processing times, ideally on the same benchmark data as used in the paper?

Unfortunately, I won't have time for an in-depth review, so best would be a cleaned up pull request that can be iteratively discussed and improved by the community.

saifullah3396 commented 4 years ago

@ahornung Well in basic usage the current implementation is definitely faster but before I produce some results on the benchmark data, I will be working on the implementation a bit more for making it even faster. It might take me some time to add a CUDA - based hashmap in there but it will definitely increase performance. I will share the benchmark results once I'm finished with it and send a PR ! :)