Open jinwchoi opened 6 years ago
@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by :
Thanks @LossNAN,
Did you use the new Caffe2 merged to Pytorch? For the old Caffe2, some submodules does not work. e.g. ATen, and eigen. I cannot clone them.
When I am building the new Caffe2 with the Non-local NN custom ops, I don't see any compiling or linking happening regarding the custom ops. e.g. I don't have "Building CXX object caffe2/CMakeFiles/caffe2.dir/video/customized_video_input_op.cc" line in my build log. Do you have this line in your build log?
Could you send me the docker for Non-local NN?
@jinwchoi sure, there are some submodules does not work, and I have already change the link in .gitmodules to solve it,so you can clone from my forks: caffe2
The docker for non_local_net:
docker pull zzhikun/non_local_net:v1.0
Any problem, please contact me
Hi @LossNAN, thanks for the pointers.
I got this error from cloning from your fork.
error: no such remote ref c80d6f7a924b53942b569b45278517565ea43d82 Fetched in submodule path 'third_party/aten', but it did not contain c80d6f7a924b53942b569b45278517565ea43d82. Direct fetching of that commit failed.
Could you take a look?
@jinwchoi I am not sure why you got this error,
it works for me by:git clone --recursive https://github.com/LossNAN/caffe2.git
Any way
If you get this code for cmake, there are also some problems happen, the biggest problem is the 'third_party/egin', so you'd better get the docker for your research
After you build a docker, you just clone
https://github.com/facebookresearch/video-nonlocal-net.git
and, you can do your own research
Best wishes
lin
I cannot use docker because I do not have root or su privilege.
Regarding the installation from the source,
I am still getting error from cloning.
When I do this git clone --recursive https://github.com/LossNAN/caffe2.git
,
I get this error.
'Fetched in submodule path 'third_party/aten', but it did not contain c80d6f7a924b53942b569b45278517565ea43d82. Direct fetching of that commit failed.'
And third_party/aten directory is just empty but .git file.
@jinwchoi try this fork:caffe2 for docker: I recommend you ask your root to install and put you in to docker group so that you can use docker although you are not root
@LossNAN I tried your docker image, but it gives me an error when I test the Caffe2 installation.
from caffe2.python import core WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short Segmentation fault (core dumped)
And if I see the libcuda.so.1 file, it's pointing to a zero-byte file.
root@9439821d5235:/usr/lib/x86_64-linux-gnu# ll libcuda.so* lrwxrwxrwx 1 root root 18 Sep 10 11:17 libcuda.so -> libcuda.so.384.130 lrwxrwxrwx 1 root root 18 Sep 10 11:17 libcuda.so.1 -> libcuda.so.384.130 -rw-r--r-- 1 root root 0 Sep 10 11:17 libcuda.so.384.130
Do you have any idea on this?
@jinwchoi
Make sure using:
nvidia docker
build your own env rather than docker
Hi ! Is there a pytorch implementation? Thanks
@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by :
- Make sure your officail caffe2 can work
- Make sure you export path by INSTALL.MD when you build caffe2-nonlocal-net
- Caffe2-nonlocal-net has some updates based on officail caffe2 in caffe2-nonlocal-net/caffe2/video, that's why you got "AttributeError: Method CustomizedVideoInput is not a registered operator ",
- So the reason why it doesn't work after build caffe2-nonlocal-net is that your unbuntu still using the offical caffe2/video
- Check /usr/local(there are caffe2, officail caffe2)
- Check /usr/local/include(there are caffe2 also, new caffe2)
- It worked for me by delete /usr/local/caffe2 And i have build docker for non_local_net, if you can use docker , contact me
@LossNAN can you give me a copy of your docker image?
@jinwchoi I got the same problems as you said, it bother me 2days and I solved it by :
- Make sure your officail caffe2 can work
- Make sure you export path by INSTALL.MD when you build caffe2-nonlocal-net
- Caffe2-nonlocal-net has some updates based on officail caffe2 in caffe2-nonlocal-net/caffe2/video, that's why you got "AttributeError: Method CustomizedVideoInput is not a registered operator ",
- So the reason why it doesn't work after build caffe2-nonlocal-net is that your unbuntu still using the offical caffe2/video
- Check /usr/local(there are caffe2, officail caffe2)
- Check /usr/local/include(there are caffe2 also, new caffe2)
- It worked for me by delete /usr/local/caffe2 And i have build docker for non_local_net, if you can use docker , contact me
@LossNAN can you give me a copy of your docker image?
Sorry, I didn't see your reply just below that. I'll download it from docker pull zzhikun/non_local_net:v1.0
@LossNAN Hi, I used your docker but I cannot run from caffe2.python import core.
The message:
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode. WARNING:root:Debug message: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short Segmentation fault (core dumped)
Could you look into this? Thanks!
Hi @xiaolonw,
I built non-local nn with the official Caffe2 which is merged to Pytorch. When I run training code, I got this error:
resnet_video_test Traceback (most recent call last): File "../tools/train_net_video.py", line 264, in
main()
File "../tools/train_net_video.py", line 260, in main
train(args)
File "../tools/train_net_video.py", line 101, in train
test_model, test_timer, test_meter = create_wrapper(is_train=False)
File "../tools/train_net_video.py", line 63, in create_wrapper
model.build_model()
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 116, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 217, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 219, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 194, in add_video_input
batch_size=batch_size,
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 159, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/core.py", line 2171, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
I think the custom video ops are not built because of some reason I don't know.
According to the link https://github.com/facebookresearch/video-nonlocal-net/issues/3, I should turn on the
USE_FFMPEG ON
in caffe2/CMakeLists.txt. However, in the new Caffe2 repository does not have 'option(USE_FFMPEG "Use ffmpeg" ON)' line in the caffe2/CMakeLists.txt file. It only has this line in the CMakeLists.txt file in the Caffe2 root directory.Can you take a look, and tell me how to deal with this issue?
Btw, your Caffe2 repository cannot be cloned due to some dependencies have broken links. So I am using the official Caffe2 repository.