Limitation of the input_channels

ShuaiChenBIGR commented 6 years ago

Thanks for the CUDA implementation! It works well in my case. But there seems to be a limitation of the input channels. When I set it above 30 I got an error about the Hashtable when building it. My GPU is Nvidia 1080. Do you have the same problem?

MiguelMonteiro commented 6 years ago

Hi, can you specify the error you are getting?

ShuaiChenBIGR commented 6 years ago

Thanks for your reply. Here is the error log in Linux: rm: cannot remove 'lattice_filter.so': No such file or directory -- The CXX compiler identification is GNU 5.4.0 -- The CUDA compiler identification is NVIDIA 9.0.176 -- Check for working CXX compiler: /usr/bin/g++-5 -- Check for working CXX compiler: /usr/bin/g++-5 -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Configuring done -- Generating done -- Build files have been written to: /hdd2/PythonCodes//Modules/Networks/CRF/CRFasRNN_tensorflow/CRFasRNNLayer/build_dir Scanning dependencies of target lattice_filter [ 25%] Building CXX object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o [ 50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cu.o /home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(208): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(213): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(63): warning: calling a constexpr host function("real") from a host device function("bfloat16") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(63): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(66): warning: calling a constexpr host function("real") from a host device function("bfloat16") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(66): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(157): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(161): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(57): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(304): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(305): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(57): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(304): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(305): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/generated_message_reflection.h(685): warning: variable "unused" was set but never used

[ 75%] Linking CUDA device code CMakeFiles/lattice_filter.dir/cmake_device_link.o nvlink error : Entry function 'nv_static_6754_tmpxft_00000a82_00000000_6_LatticeFilterKernel_cpp1_ii_46244a3b__Z10splatCacheIdLi3ELi33EEviPKT_P11MatrixEntryIS0_E12HashTableGPUIS0_XT0_EXT1_EE' uses too much shared data (0x10c00 bytes, 0xc000 max) nvlink error : Entry function 'nv_static_6754_tmpxft_00000a82_00000000_6_LatticeFilterKernel_cpp1_ii_46244a3b__Z10splatCacheIdLi4ELi33EEviPKT_P11MatrixEntryIS0_E12HashTableGPUIS0_XT0_EXT1_EE' uses too much shared data (0x10c00 bytes, 0xc000 max) CMakeFiles/lattice_filter.dir/build.make:98: recipe for target 'CMakeFiles/lattice_filter.dir/cmake_device_link.o' failed make[2]: [CMakeFiles/lattice_filter.dir/cmake_device_link.o] Error 255 CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/lattice_filter.dir/all' failed make[1]: [CMakeFiles/lattice_filter.dir/all] Error 2 Makefile:83: recipe for target 'all' failed make: *** [all] Error 2 cp: cannot stat 'lattice_filter.so': No such file or directory

MiguelMonteiro commented 6 years ago

So it's a compilation error you get but only when you set INPUT_CHANNELS greater than 30?

ShuaiChenBIGR commented 6 years ago

Yes. I haven't check the exact value between 20-30 but when I set 3 or 20 it works fine. Values like 26-30 will give compilation errors.

MiguelMonteiro commented 6 years ago

From the error you posted it seems that you are running out of memory. There is no easy fix for this as it requires deep changes in the program. Perhaps the "easy way" would be to try a better GPU if you have one around, but even then if you were to keep increasing the channels you would run into the same problem again.

ShuaiChenBIGR commented 6 years ago

I have the same feeling that it's the limitation of GPU memory, so maybe I could try the CPU version and see whether I can use more input channels. You said that we could change the CMakeList.txt to only compile the CPU version, but I don't have much experience in C++. Do you mind showing me how to modify it?

MiguelMonteiro commented 6 years ago

It has nothing to do with C++ it is the CMakeList.txt that has to be modified, this is meta-language that tells the compiler what to do. That being said it is not necessary to change anything. The CPU version works for any number of input and reference channels regardless of how the GPU code is compiled. The flag INPUT_CHANNELS only influences the GPU version. As a results, you can compile the GPU version for a small number of channels and use the CPU version with 30 channels. However, if you use 30 channels with the CPU version it's likely going to take forever...

ShuaiChenBIGR commented 6 years ago

That means the CPU version will automatically applied only if we don't import the lattice_filter_op_loader.module, just like what you did in the first test.py in the CRFasRNN layer?

MiguelMonteiro commented 6 years ago

No, it is not automatic. Tensorflow uses the GPU by default, you have to tell it to use the CPU explicitly. Something like:

with tf.device('/cpu:0'):
    code here

ShuaiChenBIGR commented 6 years ago

Oh, that will make the deep learning super slow.....But thanks for the explaination. I will try not to use too many input channels.

MiguelMonteiro / permutohedral_lattice

Limitation of the input_channels #3