LuoweiZhou / detectron-vlp

Detectron for image/video region feature extraction, inspired by Xinlei's repo
21 stars 5 forks source link

commit version for both Detectron and Caffe2 #2

Closed kracwarlock closed 4 years ago

kracwarlock commented 4 years ago

Can you specify the commits because otherwise with my setup I get the error described in https://github.com/facebookresearch/pythia/issues/179?

LuoweiZhou commented 4 years ago

@kracwarlock I have tested with pytorch 1.1 as in the GVD repo. (Not recommended) If you want to build a stand-alone caffe2 env as described here, last time (Feb. 2019) I checked everything still functioned well.

LuoweiZhou commented 4 years ago

Same pytorch version 1.1 in VLP: https://github.com/LuoweiZhou/VLP/blob/master/misc/vlp.yml#L64

kracwarlock commented 4 years ago

the same error was appearing with pytorch 1.1 then i figured it out https://github.com/LuoweiZhou/detectron-vlp/blob/5fce9b298975fa0746ccb6f8c6fa05338a324a96/tools/extract_features.py#L44-L64

all the imports are being done from the lib/ directory of the repo but they should be read from the installed detectron library. Adding detectron. prefix to all the relative import lines solved it.

LuoweiZhou commented 4 years ago

@kracwarlock Sorry I'm a little confused. Could you elaborate a little more on your solution? Are you using any Detectron docker? The library files under the current repo should suffice for the feature extraction. lib/libcaffe2_detectron_ops_gpu.so in this repo will be removed as it depends on your torch/caffe2 build.

LuoweiZhou commented 4 years ago

You need to export the paths to PYTHONPATH just in case you've done it already: export PYTHONPATH="[...]/detectron-release:$PYTHONPATH" export PYTHONPATH="[...]/detectron-release/lib:$PYTHONPATH"

kracwarlock commented 4 years ago

I was using some files from the lib/ directory of this repo and some from detectron/build/lib/ that I built. The two were conflicting so I switched to using just the detectron ones.

LuoweiZhou commented 4 years ago

I see. Just to make sure that you can reproduce the feature files as we provided in VLP or GVD. Previously, people have trouble doing so due to their discrepancies in frame sampling/caffe2/detectron etc.

shrutijpalaskar commented 3 years ago

Hi Luowei,

This seems to be an issue for me too. With the correct PYTHONPATHs, I still need to add the paths to detectron and detectron/lib to my PYTHONPATH. If I don't, I get the following error:

/data/ASR1/spalaska/anaconda3/envs/detectronPy3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "tools/extract_features.py", line 47, in <module>
    c2_utils.import_detectron_ops()
  File "/data/ASR5/spalaska/pytorch-projects/detectron-main-folder/detectron-vlp/lib/utils/c2.py", line 41, in import_detectron_ops
    detectron_ops_lib = envu.get_detectron_ops_lib()
  File "/data/ASR5/spalaska/pytorch-projects/detectron-main-folder/detectron-vlp/lib/utils/env.py", line 71, in get_detectron_ops_lib
    ('Detectron ops lib not found; make sure that your Caffe2 '
AssertionError: Detectron ops lib not found; make sure that your Caffe2 version includes Detectron module

And if I do, I get another error about the NUM_CLASSES arguments not being read in.

CUDA_VISIBLE_DEVICES=1 ./extract_feat_vcr.sh
/data/ASR1/spalaska/anaconda3/envs/detectronPy3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Traceback (most recent call last):
  File "tools/extract_features.py", line 283, in <module>
Found Detectron ops lib: /data/ASR1/spalaska/anaconda3/envs/detectronPy3/lib/python3.6/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so
Found Detectron ops lib: /data/ASR1/spalaska/anaconda3/envs/detectronPy3/lib/python3.6/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so
-1
    main(args)
  File "tools/extract_features.py", line 226, in main
    model = infer_engine.initialize_model_from_cfg(args.weights)
  File "/data/ASR5/spalaska/pytorch-projects/detectron-main-folder/detectron/detectron/core/test_engine.py", line 327, in initialize_model_from_cfg
    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
  File "/data/ASR5/spalaska/pytorch-projects/detectron-main-folder/detectron/detectron/modeling/model_builder.py", line 120, in create
    init_params=train
  File "/data/ASR5/spalaska/pytorch-projects/detectron-main-folder/detectron/detectron/modeling/detector.py", line 50, in __init__
    assert self.num_classes > 0, 'num_classes must be > 0'
AssertionError: num_classes must be > 0

This seems like a PYTHONPATH issue but I have tried different combinations already without much use. I am new to detectron and I am trying to extract features for a different dataset (the Visual Commonsense Reasoning dataset). If you have some experience with this issue, please let me know!

Thank you, Shruti

LuoweiZhou commented 3 years ago

@shrutijpalaskar Just to check, have you copied caffe .so file (e.g., libcaffe2_detectron_ops_gpu.so) to detectron-vlp/lib, as mentioned in 6a here? Besides, that issue thread (https://github.com/LuoweiZhou/detectron-vlp/issues/5) overall should be helpful.

shrutijpalaskar commented 3 years ago

Thanks a lot, Luowei! Issue thread #5 helped with running detectron-vlp as a standalone repo.

For future reference, copying caffe .so file, and the cython*.so files into this repo helps resolve dependency issues with the original detectron repository. I am using Python3 as well, so porting issues were solved using the import changes stated in #5 along with the Python3 patch from https://github.com/facebookresearch/Detectron/pull/110 to fix the Python2 to Python3 Bytes/String read errors.

Happy to add a Python3 PR if you still need it! :)

LuoweiZhou commented 3 years ago

Hi Shruti, glad to know you had figured it out! A Python3 PR would be great :) Thanks

MarcusNerva commented 3 years ago

@shrutijpalaskar Hi there! I am happy to see that you can extract features using detectron-vlp. Would you please share your environment requirements? Or share a Dockerfile? I planed to extract ROI features with Faster-RCNN-X101 whose pretrained model is provided. However, I meet an Runtime Error:

[E net_async_base.cc:377] [enforce fail at math_gpu.cu:569] status == CUBLAS_STATUS_SUCCESS. 13 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653000816/work/caffe2/utils/math_gpu.cu:569: CUBLAS_STATUS_EXECUTION_FAILED Error from operator: input: "gpu_0/fc7" input: "gpu_0/bbox_pred_w" input: "gpu_0/bbox_pred_b" output: "gpu_0/bbox_pred" name: "" type: "FC" arg { name: "use_cudnn" i: 1 } arg { name: "cudnn_exhaustive_search" i: 0 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f65a9797409 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: void caffe2::math::Gemm<float, caffe2::CUDAContext, caffe2::DefaultEngine>(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const, float const, float, float, caffe2::CUDAContext*, caffe2::TensorProto_DataType) + 0x6da (0x7f65ac6f399a in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: + 0x14e3730 (0x7f65ab326730 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: + 0x14d8e88 (0x7f65ab31be88 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x13cb0b5 (0x7f65ab20e0b5 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f65cdd1cb94 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x168f009 (0x7f65cdd23009 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7f65a97912f3 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xb8678 (0x7f65e4a5b678 in /home/marcusnerva/anaconda3/envs/gvd_pt1_1/bin/../lib/libstdc++.so.6) frame #9: + 0x9609 (0x7f65eb05d609 in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x43 (0x7f65eae2a293 in /lib/x86_64-linux-gnu/libc.so.6) , op FC

OS: Ubuntu 20.04 Anaconda Enviroment: same as GVD

Could you help me out? I am so confused now. Looking forward to your reply! ^ _ ^