MiguelMonteiro / permutohedral_lattice

Permutohedral Lattice C++/CUDA implementation + TensorFlow Op (CPU/GPU)
83 stars 18 forks source link

Cuda Debug informations #5

Closed Fettpet closed 6 years ago

Fettpet commented 6 years ago

Hello,

inside the memory allocator on line 96, 102 and 106 are naked cuda calls. They doesn't deliver any debug information. For example if the gpu rans out of memory the program exists when it checks the createLattice function. This is a hard to find error. A simple solution would be to integrate some checks after the cuda calls.

greets Sebastian

MiguelMonteiro commented 6 years ago

Thanks for pointing it out. If you are using the code within Tensoflow that is not a problem because the memory allocation is handled by Tensorflow. This is only a problem if you are using this as a standalone package. I will fix it when I get the time or if you would like to help me out please submit a pull request.

Best,

Miguel

Fettpet commented 6 years ago

Hey Miguel, I like to help you. I create a pull request in the next days. I use the Permutohedral Lattice for a pytorch backend.

greets Sebastian

MiguelMonteiro commented 6 years ago

Thanks,

I don't know how the pytorch GPU memory allocation works? Does it have its own allocator or does it rely on C++ allocation? In Tensorflow, the Tensorflow memory allocator allocates most of the GPU memory (like 95%) at beginning independently of whether its needed or not. Because of this, when I was using C++ allocation instead of the Tensorflow allocator I would always run out of memory. You should check whether this is the case in pytorch.

Best,

Miguel