Building with the official Caffe2 merged to Pytorch

jinwchoi commented 6 years ago

Hi @xiaolonw,

I built non-local nn with the official Caffe2 which is merged to Pytorch. When I run training code, I got this error:

resnet_video_test Traceback (most recent call last): File "../tools/train_net_video.py", line 264, in main() File "../tools/train_net_video.py", line 260, in main train(args) File "../tools/train_net_video.py", line 101, in train test_model, test_timer, test_meter = create_wrapper(is_train=False) File "../tools/train_net_video.py", line 63, in create_wrapper model.build_model() File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 116, in build_model train=self.train, force_fw_only=self.force_fw_only File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 217, in create_data_parallel_model use_nccl=not cfg.DEBUG, # org: True File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU Parallelize(*args, **kwargs) File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 219, in Parallelize input_builder_fun(model_helper_obj) File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 194, in add_video_input batch_size=batch_size, File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 159, in AddVideoInput data, label = model.net.CustomizedVideoInput( File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/core.py", line 2171, in getattr ",".join(workspace.C.nearby_opnames(op_type)) + ']' AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: [] I think the custom video ops are not built because of some reason I don't know.

According to the link https://github.com/facebookresearch/video-nonlocal-net/issues/3, I should turn on the USE_FFMPEG ON in caffe2/CMakeLists.txt. However, in the new Caffe2 repository does not have 'option(USE_FFMPEG "Use ffmpeg" ON)' line in the caffe2/CMakeLists.txt file. It only has this line in the CMakeLists.txt file in the Caffe2 root directory.

Can you take a look, and tell me how to deal with this issue?

Btw, your Caffe2 repository cannot be cloned due to some dependencies have broken links. So I am using the official Caffe2 repository.

LossNAN commented 6 years ago

@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by ：

Make sure your officail caffe2 can work
Make sure you export path by INSTALL.MD when you build caffe2-nonlocal-net
Caffe2-nonlocal-net has some updates based on officail caffe2 in caffe2-nonlocal-net/caffe2/video, that's why you got "AttributeError: Method CustomizedVideoInput is not a registered operator ",
So the reason why it doesn't work after build caffe2-nonlocal-net is that your unbuntu still using the offical caffe2/video
Check /usr/local(there are caffe2, officail caffe2)
Check /usr/local/include(there are caffe2 also, new caffe2)
It worked for me by delete /usr/local/caffe2 And i have build docker for non_local_net, if you can use docker , contact me

jinwchoi commented 6 years ago

Thanks @LossNAN,

Did you use the new Caffe2 merged to Pytorch? For the old Caffe2, some submodules does not work. e.g. ATen, and eigen. I cannot clone them.

When I am building the new Caffe2 with the Non-local NN custom ops, I don't see any compiling or linking happening regarding the custom ops. e.g. I don't have "Building CXX object caffe2/CMakeFiles/caffe2.dir/video/customized_video_input_op.cc" line in my build log. Do you have this line in your build log?

Could you send me the docker for Non-local NN?

LossNAN commented 6 years ago

@jinwchoi sure, there are some submodules does not work, and I have already change the link in .gitmodules to solve it,so you can clone from my forks: caffe2 The docker for non_local_net: docker pull zzhikun/non_local_net:v1.0 Any problem, please contact me

jinwchoi commented 6 years ago

Hi @LossNAN, thanks for the pointers.

I got this error from cloning from your fork.

error: no such remote ref c80d6f7a924b53942b569b45278517565ea43d82 Fetched in submodule path 'third_party/aten', but it did not contain c80d6f7a924b53942b569b45278517565ea43d82. Direct fetching of that commit failed.

Could you take a look?

LossNAN commented 6 years ago

@jinwchoi I am not sure why you got this error, it works for me by:git clone --recursive https://github.com/LossNAN/caffe2.git Any way If you get this code for cmake, there are also some problems happen, the biggest problem is the 'third_party/egin', so you'd better get the docker for your research After you build a docker, you just clone https://github.com/facebookresearch/video-nonlocal-net.git and, you can do your own research Best wishes lin

jinwchoi commented 6 years ago

I cannot use docker because I do not have root or su privilege.

Regarding the installation from the source, I am still getting error from cloning. When I do this git clone --recursive https://github.com/LossNAN/caffe2.git, I get this error. 'Fetched in submodule path 'third_party/aten', but it did not contain c80d6f7a924b53942b569b45278517565ea43d82. Direct fetching of that commit failed.' And third_party/aten directory is just empty but .git file.

LossNAN commented 6 years ago

@jinwchoi try this fork：caffe2 for docker: I recommend you ask your root to install and put you in to docker group so that you can use docker although you are not root

jinwchoi commented 6 years ago

@LossNAN I tried your docker image, but it gives me an error when I test the Caffe2 installation.

from caffe2.python import core WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short Segmentation fault (core dumped)

And if I see the libcuda.so.1 file, it's pointing to a zero-byte file.

root@9439821d5235:/usr/lib/x86_64-linux-gnu# ll libcuda.so* lrwxrwxrwx 1 root root 18 Sep 10 11:17 libcuda.so -> libcuda.so.384.130 lrwxrwxrwx 1 root root 18 Sep 10 11:17 libcuda.so.1 -> libcuda.so.384.130 -rw-r--r-- 1 root root 0 Sep 10 11:17 libcuda.so.384.130

Do you have any idea on this?

LossNAN commented 6 years ago

@jinwchoi Make sure using: nvidia docker build your own env rather than docker

InstantWindy commented 5 years ago

Hi ! Is there a pytorch implementation? Thanks

SunGaofeng commented 5 years ago

@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by ：

Make sure your officail caffe2 can work

Make sure you export path by INSTALL.MD when you build caffe2-nonlocal-net

Caffe2-nonlocal-net has some updates based on officail caffe2 in caffe2-nonlocal-net/caffe2/video, that's why you got "AttributeError: Method CustomizedVideoInput is not a registered operator ",

So the reason why it doesn't work after build caffe2-nonlocal-net is that your unbuntu still using the offical caffe2/video

Check /usr/local(there are caffe2, officail caffe2)

Check /usr/local/include(there are caffe2 also, new caffe2)

It worked for me by delete /usr/local/caffe2 And i have build docker for non_local_net, if you can use docker , contact me

@LossNAN can you give me a copy of your docker image?

SunGaofeng commented 5 years ago

@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by ：

Make sure your officail caffe2 can work

Make sure you export path by INSTALL.MD when you build caffe2-nonlocal-net

Caffe2-nonlocal-net has some updates based on officail caffe2 in caffe2-nonlocal-net/caffe2/video, that's why you got "AttributeError: Method CustomizedVideoInput is not a registered operator ",

So the reason why it doesn't work after build caffe2-nonlocal-net is that your unbuntu still using the offical caffe2/video

Check /usr/local(there are caffe2, officail caffe2)

Check /usr/local/include(there are caffe2 also, new caffe2)

It worked for me by delete /usr/local/caffe2 And i have build docker for non_local_net, if you can use docker , contact me

@LossNAN can you give me a copy of your docker image?

Sorry, I didn't see your reply just below that. I'll download it from docker pull zzhikun/non_local_net:v1.0

hukkai commented 5 years ago

@LossNAN Hi, I used your docker but I cannot run from caffe2.python import core.

The message:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short Segmentation fault (core dumped)

Could you look into this? Thanks!

facebookresearch / video-nonlocal-net

Building with the official Caffe2 merged to Pytorch #52