Zardinality / TF_Deformable_Net

Deformable convolution net on Tensorflow
MIT License
169 stars 53 forks source link

A question when training #3

Open xiaowenhe opened 7 years ago

xiaowenhe commented 7 years ago

@Zardinality ,thanks for your answer. But the error still again. Even I change the -arch=sm_37 (K80) in make.sh and setup.py, and rerun the make.

paulcx commented 7 years ago

Another error when trainning:

kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test kittivoc_train kittivoc_val kittivoc_trainval kittivoc_test nthu_71 nthu_370 Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 30, in from lib.networks.factory import get_network File "/root/tf_deformable_frcnn/lib/networks/init.py", line 8, in from .VGGnet_train import VGGnet_train File "/root/tf_deformable_frcnn/lib/networks/VGGnet_train.py", line 2, in from .network import Network File "/root/tf_deformable_frcnn/lib/networks/network.py", line 13, in from ..deform_conv_layer import deform_conv_op as deform_conv_op File "/root/tf_deformable_frcnn/lib/deform_conv_layer/deform_conv_op.py", line 8, in _deform_conv_module = tf.load_op_library(filename) File "/usr/local/lib/python/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library None, None, error_msg, error_code) tensorflow.python.framework.errors_impl.NotFoundError: /root/tf_deformable_frcnn/lib/deform_conv_layer/deform_conv.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

Any thoughts?

Zardinality commented 7 years ago

@xiaowenhe I was aware about your problem, please don't reopen two same issues.

Zardinality commented 7 years ago

@paulcx It might related to gcc version and certain flags. I will add some lines in make.sh and FAQ. Now if you want to fix it instantly, check this issue.

paulcx commented 7 years ago

@Zardinality I have tried both solutions but they don't work so far with same error. g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -fPIC -D GOOGLE_CUDA -lcudart -L $CUDA_HOME/lib64 or

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=1 -o roi_pooling.so roi_pooling_op.cc \ roi_pooling_op.cu.o -I $TF_INC -fPIC -D GOOGLE_CUDA -lcudart -L $CUDA_HOME/lib64

Am I right about the solution?

Zardinality commented 7 years ago

@paulcx Make sure you use the recompiled version. Or try removing the related flag maybe. Which version of g++ do you use?

paulcx commented 7 years ago

@Zardinality What do you mean by using the recomplied version? g++ is 5.40

Zardinality commented 7 years ago

@paulcx I mean manually remove original generated file such as .o and .so, then recompile it. Also, since you use g++5(which I didn't have chance to test), you should compile with -D_GLIBCXX_USE_CXX11_ABI=0.

Zardinality commented 7 years ago

@paulcx Hi, have you worked out where the problem is? I have updated readme to include a workaround given by others in another issue, which solves the same problem.

paulcx commented 7 years ago

@Zardinality Not yet. The solution does not work for g++5.4 at least.

selkerdawy commented 6 years ago

@paulcx check out this , it solved a similar undefined symbol problem for me.