cfernandezlab / CFL

Tensorflow implementation of our end-to-end model to recover 3D layouts. Also with equirectangular convolutions!
GNU General Public License v3.0
105 stars 18 forks source link

test deform conv fails #1

Closed Morpheus3000 closed 5 years ago

Morpheus3000 commented 5 years ago

Hello,

Let me start by thanking you for the wonderful work and for making the code publicly available! Interested by your work, I have been trying to set up the pipeline. As mentioned in the readme, I installed all the dependencies in python and compiled the deform_conv_layer. But when I try to run the test_deform_conv.py, it fails with the following error:

tensorflow.python.framework.errors_impl.NotFoundError: ~/CFLE2E/CFL/Models/deform_conv_layer/deform_conv.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

From my previous experience, I had recognized this as being the result of a mismatched gcc that was used to compile tensorflow and the target custom layer. I installed tensorflow from the conda repositories. So I don't know which exact gcc was used to compile the binaries. I am using Arch, which comes with gcc 8.3. I am also using Cuda 10. I tried compiling the custom layer with both 4.9 (as mentioned) as well as gcc 5.5. However, both of them results in the same error. So it would be a great help if you could point me to a solution.

jmfacil commented 5 years ago

Hi @Morpheus3000 Thanks for your nice words about our work.
Just to make sure, you can check that when you compile deform_conv_layer you are using conda venv so that the compilation will be linked to the same version of TensorFlow that you will be using in your code. First two lines of make.sh:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

Also if you change the gcc version you should make sure to comment gcc4.9 lines and uncomment gcc-5.0 lines in the make.sh.

Morpheus3000 commented 5 years ago

Hi!

Thank you for the quick reply! I only have tensorflow installed in this environment and the shell script picks them up properly as confirmed by the configuration variables listed by the script before starting the compilation.

Configuration variables: Tensorflow Include directory: ~/anaconda3/envs/corner_pred/lib/python3.6/site-packages/tensorflow/include Tensorflow Library directory: ~/anaconda3/envs/corner_pred/lib/python3.6/site-packages/tensorflow Nvidia Arch: sm_75

I have a 2080 Ti, so the Shader Model was modified to be 7.5.

As for the gcc 4.9 and 5, I did use the respective lines for that in the script and commented the other line. However, I used gcc5.5 for the gcc 5 compile, and the lowest I could find for gcc 5 on the repos for my distribution is 5.3. But since it is within the same major version, I would doubt that would be the reason.

cfernandezlab commented 5 years ago

Hi @Morpheus3000,

We have tested test_CFL.py with a 2080 Ti (test_deform_conv.py is not part of CFL, but of TF_Deformable_Net).

I copy here the configuration used: Driver de Nvidia 410.104 CUDA Version: 10.0 GeForce RTX 2080Ti 54C P8 22W / 250W 10986MiB

Here the changes in the make.sh file: ARCH=sm_75 g++-4.8 (in both lines, 53 and 60)

Can you please try this and confirm if it works? Thank you :)

Morpheus3000 commented 5 years ago

Thank you, that worked! I missed the g++ at line 53, which is why it was compiling with 2 different compiler and giving me the undefined symbol error. :D

I ran the test scripts and it works perfectly!

Along the same lines, would it be possible to provide the training script/configurations for the network?

cfernandezlab commented 5 years ago

I'm glad that it worked!
Regarding the training script, we plan to provide it in the following months, the work is still under review ;) I close the issue.

Morpheus3000 commented 5 years ago

I see, thanks!