I'm working on creating a colab notebook to extract features. I have two questions:

First, following the instructions, I've done the following in colab, and at the end I run into an error about Missing utils module in extract_feature_gvd.py. Could you tell me if I'm running this correctly and adding the right directories to path? Also, in

1. CLone detectron-vlp in root dir !git clone https://github.com/LuoweiZhou/detectron-vlp #Get detectron-vlp for getting Region-wise/Frame-wise features for VLP and GVD.

2. Check caffe imports from caffe2.python import core #check if we can import already in colab from caffe2.python import workspace #Check if Caffe2 GPU is available in colab print(workspace.NumCudaDevices()) #prints num>0 in colab

3. Install cocoapi, clone into root dir !git clone https://github.com/cocodataset/cocoapi.git

4. make install in PythonAPI folder %cd cocoapi/PythonAPI !make install

5. Return back to root dir %cd ../..

6. Get and build detectron. Clone detectron in root dir !git clone https://github.com/facebookresearch/detectron.git #get detectron -- old version

install environment modules. Most should be available in colab

!pip install -r detectron/requirements.txt

navigate into detectron folder for install

%cd detectron

build and install

!make

Run checks

!python detectron/tests/test_spatial_narrow_as_op.py #Run tests to check if installed ok.

# Add detectron to path import sys sys.path.append('/content/detectron/detectron')

Check

print(sys.path)

7. Detectron installed ok. Return to detectron-vlp to finish %cd /content/detectron-vlp

8. Download features .yaml, checkpoint .pkl. They should be placed in detectron-vlp directory !wget -O e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.yaml http://dl.fbaipublicfiles.com/ActivityNet-Entities/ActivityNet-Entities/e2e_faster_rcnn_X-101-64x4d-FPN_2x.yaml !wget -O e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.pkl http://dl.fbaipublicfiles.com/ActivityNet-Entities/ActivityNet-Entities/e2e_faster_rcnn_X-101-64x4d-FPN_2x.pkl

9. Return to root directory %cd ..

10. Create dir for images from your video. DATA ROOT should be where video frames are !mkdir data_imgs

11. Extract video frames out and store in DATA_ROOT

Uses code from GVD repository for sampling video frames

import os import numpy as np sample_frm = 10 e_t = 30 s_t = 10 vid_path = "vid3.mp4" segment_path = "data_imgs/" itvs = np.linspace(s_t, e_t, sample_frm+1)+(e_t-s_t)/sample_frm/2. for i in range(sample_frm): os.system(' '.join(('ffmpeg', '-loglevel', 'panic', '-ss', str(itvs[i]), '-i', vid_path, '-vframes', '1', '-vf','scale=720:-1', os.path.join(segment_path, str(i+1).zfill(2)+'.jpg'))))

12. Return to detectron-vlp directory %cd detectron-vlp

13. Modify extract_feat_gvd_anet.sh to set data root. Then run. Error occurs on this line.

Set data root as appropriate

!./extract_feat_gvd_anet.sh

The error says it is Missing utils module. I attempted to fix it by changing the imports in extract_features_gvd_anet.py to the following, and replacing any calls to utils/core with detectron.utils, and detectron.core:

from detectron.core.config import assert_and_infer_cfg from detectron.core.config import cfg from detectron.core.config import merge_cfg_from_file from detectron.utils.timer import Timer import detectron.core.test_engine as infer_engine import detectron.datasets.dummy_datasets as dummy_datasets import detectron.utils.c2 as c2_utils import detectron.utils.logging import detectron.utils.vis as vis_utils from detectron.utils.boxes import nms

Though this now runs, am I referencing the correct modules?

Second, I'm confused on how to modify extract_feat_gvd_anet.sh for my a single video. I changed the data root but did not change any of the other default options. Is this correct or should I be specifying other options? In particular, I don't understand these options: --list_of_ids $DATA_ROOT/dic_anet.json $DATA_ROOT/frames_1_10frm \ what do I do with it?

Thank you for your time. I really appreciate it.

@nikky4D Thanks for the effort. Please consider submitting a PR on the notebook to make it a part of the repo.

i) This repo alone has the necessary modules for VLP/GVD feature extraction. You need to refer to the modules under lib/ for utils and core (add $YOUR_DETECTRON_ROOT/lib to $PYTHONPATH) rather than the ones in the full Detectron. Also, make sure your libcaffe2_detectron_ops_gpu.so from caffe2 compilation is copied to lib/.

ii) The way how data is stored in GVD/NBT is a little messy... The feature extraction has to follow the exact video ID ordering in dic_anet.json (can be downloaded from here under the data section) to ensure correct data loading (compare here and here). You may want to refer to dic_anet.json for its structure. $DATA_ROOT/frames_1_10frm is just the directory to the sampled images. Note that our other project VLP does not have this issue as features are stored in a more organized fashion.

I'll be happy to once I get it working.

Thank you for the response. I've updated my code by doing the following:

I've copied the caffe2 .so in colab to detectron-vlp/lib.
I've also added detectron-vlp/lib to my python path.
I've also downloaded dic_anet.json and put it in $DATA_ROOT/
I've put the sample frames from the video into `$DATA_ROOT/frames_1_10frm'

however, when I run ./extract_gvd, I run into the following error:

Traceback (most recent call last):
  File "tools/extract_features_gvd_anet.py", line 65, in <module>
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
    import core.test_engine as infer_engine
  File "/content/detectron-vlp/lib/core/test_engine.py", line 36, in <module>
    from core.rpn_generator import generate_rpn_on_dataset
  File "/content/detectron-vlp/lib/core/rpn_generator.py", line 42, in <module>
    from datasets import task_evaluation
  File "/content/detectron-vlp/lib/datasets/task_evaluation.py", line 47, in <module>
    import datasets.json_dataset_evaluator as json_dataset_evaluator
  File "/content/detectron-vlp/lib/datasets/json_dataset_evaluator.py", line 33, in <module>
    import utils.boxes as box_utils
  File "/content/detectron-vlp/lib/utils/boxes.py", line 51, in <module>
    import utils.cython_bbox as cython_bbox
ImportError: dynamic module does not define module export function (PyInit_cython_bbox)

I think the error may come because I built something improperly. My first question, is my order of install (above in my earlier comment) correct? Second question: am I supposed to run make on detectron-vlp as well as on the official detectron? Or as you said, I'm supposed to only use detectron-vlp?

Thanks again for any help you can give.

@nikky4D Just to double check, have you removed the following lines:

# Add detectron to path
import sys
sys.path.append('/content/detectron/detectron')

Remove the dependencies on the original Detectron to avoid the confusion. In fact, this repo should work stand-alone (caffe2 installation is required but not for a fresh Detectron).

Yes I have. The dependency is set to detectron-vlp. I've redone the steps according to your last statement. It appears to have been looking for the cython builds. I copied the cython*.so (for bbox, and nms as shown in my steps below, and the error disappeared, though new ones occurred.

Also, as I'm using colab, and python3, I've been converting detectron-vlp to python3. This involved modifying imports in various files. I changed mentions of urllib2, Queue, cPickle to urllib.request, queue, pickle.

So currently, these are my steps:

1. CLone detectron-vlp in root dir !git clone https://github.com/LuoweiZhou/detectron-vlp #Get detectron-vlp for getting Region-wise/Frame-wise features for VLP and GVD.

2. Check caffe imports

from caffe2.python import core #check if we can import already in colab
from caffe2.python import workspace #Check if Caffe2 GPU is available in colab
print(workspace.NumCudaDevices()) #prints num>0 in colab

3. Install cocoapi, clone into root dir !git clone https://github.com/cocodataset/cocoapi.git

4. make install in PythonAPI folder %cd cocoapi/PythonAPI !make install

5. Return back to root dir %cd ../..

6. Get and build detectron. Clone detectron in root dir !git clone https://github.com/facebookresearch/detectron.git #get detectron -- old version

install environment modules. Most should be available in colab

!pip install -r detectron/requirements.txt

navigate into detectron folder for install

%cd detectron

build and install

!make

Run checks

!python detectron/tests/test_spatial_narrow_as_op.py #Run tests to check if installed ok.

6a #COPY detectron .so files to detectron-vlp/lib

!cp detectron/utils/cython_nms.cpython-36m-x86_64-linux-gnu.so "../detectron-vlp/lib/utils/"
!cp detectron/utils/cython_bbox.cpython-36m-x86_64-linux-gnu.so "../detectron-vlp/lib/utils/"
!cp /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_detectron_ops_gpu.so  "../detectron-vlp/lib/"

6b: ADD DETECTRON-VLP TO PATH

import sys
sys.path.append('/content/detectron-vlp)
sys.path.append("/content/detectron-vlp/lib')

Check

print(sys.path) 7. Detectron installed ok. Return to detectron-vlp to finish %cd /content/detectron-vlp

8. Download features .yaml, checkpoint .pkl. They should be placed in detectron-vlp directory

!wget -O e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.yaml http://dl.fbaipublicfiles.com/ActivityNet-Entities/ActivityNet-Entities/e2e_faster_rcnn_X-101-64x4d-FPN_2x.yaml
!wget -O e2e_faster_rcnn_X-101-64x4d-FPN_2x-gvd.pkl http://dl.fbaipublicfiles.com/ActivityNet-Entities/ActivityNet-Entities/e2e_faster_rcnn_X-101-64x4d-FPN_2x.pkl

9. Return to root directory %cd .. 10. Create dir for images from your video. DATA ROOT should be where video frames are

!mkdir data_imgs
!mkdir data_imgs/frames_1_10frm

11. Extract video frames out and store in DATA_ROOT

#Uses code from GVD repository for sampling video frames
import os
import numpy as np
sample_frm = 10
e_t = 30
s_t = 10
vid_path = "vid3.mp4"
segment_path = "data_imgs/frames_1_10frm"
itvs = np.linspace(s_t, e_t, sample_frm+1)+(e_t-s_t)/sample_frm/2.
for i in range(sample_frm):
    os.system(' '.join(('ffmpeg', '-loglevel', 'panic', '-ss', str(itvs[i]), '-i', vid_path, '-vframes', '1', '-vf','scale=720:-1', os.path.join(segment_path, str(i+1).zfill(2)+'.jpg'))))

**11b. Copy dict_anet.json, from anet directory. Put in data_imgs folder. Modify json to keep all the initial labels. Modify json to rename vid_id to vid3 like below. Delete all but one of the vid_id entries. The last line of the json file should now look like this: ..."baked": "baked", "In": "in"}, "videos": [{"vid_id": "vid3", "seg_id": "0", "split": "testing", "id": "vid3_segment_00"}]}

12. Return to detectron-vlp directory %cd detectron-vlp 13. Modify extract_feat_gvd_anet.sh to set data root. Then run. Error occurs on this line.

Set data root as appropriate

!./extract_feat_gvd_anet.sh I called !./extract_feat_gvd_anet.sh with following input:

DATA_ROOT=data_imgs

python tools/extract_features_gvd_anet.py \
  --output-dir $DATA_ROOT/fc6_feat_100rois \
  --det-output-file $DATA_ROOT/anet_detection_vg_fc6_feat_100rois.h5 \
  --max_bboxes 100 --min_bboxes 100 \
  --feat_name fc6 $DATA_ROOT/frames_1_10frm \
  | tee log/log_extract_features_vg_100rois_gvd_anet

The error states:

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.

ERROR model_builder.py: 167: Failed to find function: b''
Traceback (most recent call last):
  File "tools/extract_features_gvd_anet.py", line 293, in <module>
    main(args)
  File "tools/extract_features_gvd_anet.py", line 203, in main
    model = infer_engine.initialize_model_from_cfg(args.weights)
  File "/content/detectron-vlp/lib/core/test_engine.py", line 351, in initialize_model_from_cfg
    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 138, in create
    return get_func(model_type_func)(model)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 89, in generalized_rcnn
    add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 158, in get_func
    parts = func_name.split('.')
TypeError: a bytes-like object is required, not 'str'

When I print out the func_names, from get_func() in model_build.py, I get

generalized_rcnn
FPN.add_fpn_ResNet101_conv5_body
fast_rcnn_heads.add_roi_2mlp_head
b''

When I run for VLP using !./extract_feat_flickr30k.sh I get the same error, with input:

DATA_ROOT=data_imgs

python tools/extract_features.py \
    --featcls-output-dir $DATA_ROOT/region_feat_gvd_wo_bgd/feat_cls_1000 \
    --box-output-dir $DATA_ROOT/region_feat_gvd_wo_bgd/raw_bbox \
    --output-file-prefix flickr30k_detection_vg_100dets_vlp_checkpoint_trainval \
    --max_bboxes 100 --min_bboxes 100 \
    --cfg e2e_faster_rcnn_X-101-64x4d-FPN_2x-vlp.yaml \
    --wts e2e_faster_rcnn_X-101-64x4d-FPN_2x-vlp.pkl \
    --proc_split 00 --dataset Flickr30k \
    $DATA_ROOT/images \
    | tee log/log_extract_features_vg_100dets_flickr30k_"$1"

The error states:

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
Namespace(box_output_dir='data_imgs/region_feat_gvd_wo_bgd/raw_bbox', cfg='e2e_faster_rcnn_X-101-64x4d-FPN_2x-vlp.yaml', data_type='float32', dataset='Flickr30k', feat_name='gpu_0/fc6', featcls_output_dir='data_imgs/region_feat_gvd_wo_bgd/feat_cls_1000', im_or_folder='data_imgs/images', image_ext='jpg', max_bboxes=100, min_bboxes=100, output_file_prefix='flickr30k_detection_vg_100dets_vlp_checkpoint_trainval', proc_split='00', weights='e2e_faster_rcnn_X-101-64x4d-FPN_2x-vlp.pkl')
WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
ERROR model_builder.py: 167: Failed to find function: b''
Traceback (most recent call last):
  File "tools/extract_features.py", line 284, in <module>
    main(args)
  File "tools/extract_features.py", line 226, in main
    model = infer_engine.initialize_model_from_cfg(args.weights)
  File "/content/detectron-vlp/lib/core/test_engine.py", line 351, in initialize_model_from_cfg
    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 138, in create
    return get_func(model_type_func)(model)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 89, in generalized_rcnn
    add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 158, in get_func
    parts = func_name.split('.')
TypeError: a bytes-like object is required, not 'str'

Again, when printing out func_name, I get:

generalized_rcnn
FPN.add_fpn_ResNet101_conv5_body
fast_rcnn_heads.add_roi_2mlp_head
b''

Any ideas as to what is causing this? What is b supposed to be? Do you have any idea where I may be going wrong? Perharps in my input to .sh or someplace else?

Thanks again for all your help.

Digging a little deeper, in lib\modeling\model_builder.py, I explitctly converted func_name to str like so: str(func_name).split('.') in lib/modeling/model_builder.py", line 158, in get_func parts = func_name.split('.'). In addition, in the generalized_rcnn(model) function, I put a print statement to pull out the model parts like so

def generalized_rcnn(model):
    """This model type handles:
      - Fast R-CNN
      - RPN only (not integrated with Fast R-CNN)
      - Faster R-CNN (stagewise training from NIPS paper)
      - Faster R-CNN (end-to-end joint training)
      - Mask R-CNN (stagewise training from NIPS paper)
      - Mask R-CNN (end-to-end joint training)
    """
    print("IN GENERALIZED_RCNN ", "CONV_BODY", cfg.MODEL.CONV_BODY)
    print("IN GENERALIZED_RCNN ", "ROI_BOX_HEAD", cfg.FAST_RCNN.ROI_BOX_HEAD)
    print("IN GENERALIZED_RCNN ", "HEAD FUNC", cfg.MRCNN.ROI_MASK_HEAD)
    print("IN GENERALIZED_RCNN ", "KEYPOINT_BODY", cfg.KRCNN.ROI_KEYPOINTS_HEAD)

    return build_generic_detection_model(
        model,
        get_func(cfg.MODEL.CONV_BODY),
        add_roi_box_head_func=get_func(cfg.FAST_RCNN.ROI_BOX_HEAD),
        add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
        add_roi_keypoint_head_func=get_func(cfg.KRCNN.ROI_KEYPOINTS_HEAD),
        freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
    )

The printout gives me this:

IN GENERALIZED_RCNN  CONV_BODY FPN.add_fpn_ResNet101_conv5_body
IN GENERALIZED_RCNN  ROI_BOX_HEAD fast_rcnn_heads.add_roi_2mlp_head
IN GENERALIZED_RCNN  HEAD FUNC b''
IN GENERALIZED_RCNN  KEYPOINT_BODY b''

With the explicit conversion, the new error is:

ERROR model_builder.py: 178: Failed to find function: b''
Traceback (most recent call last):
  File "tools/extract_features_gvd_anet.py", line 293, in <module>
    main(args)
  File "tools/extract_features_gvd_anet.py", line 203, in main
    model = infer_engine.initialize_model_from_cfg(args.weights)
  File "/content/detectron-vlp/lib/core/test_engine.py", line 352, in initialize_model_from_cfg
    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 146, in create
    return get_func(model_type_func)(model)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 94, in generalized_rcnn
    add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 172, in get_func
    return globals()[parts[0]]
KeyError: "b''"

If b is correct, then perhaps it is an issue at func_name.split(','). I'll keep checking. Please let me know if you have any ideas.

Hi, once again,

I've modified the install process above slightly to only focus on working with detectron-vlp. I do the following:

1. CLone detectron-vlp in root dir !git clone https://github.com/LuoweiZhou/detectron-vlp #Get detectron-vlp for getting Region-wise/Frame-wise features for VLP and GVD.

2. Check caffe imports

from caffe2.python import core #check if we can import already in colab
from caffe2.python import workspace #Check if Caffe2 GPU is available in colab
print(workspace.NumCudaDevices()) #prints num>0 in colab

3. Install cocoapi, clone into root dir !git clone https://github.com/cocodataset/cocoapi.git

4. make install in PythonAPI folder %cd cocoapi/PythonAPI !make install

5. Return back to root dir %cd ../..

**6. Build the cython detectron-vlp nms, bbox.

6a #COPY caffe and detectron-vlp .so files to detectron-vlp/lib

!cp detectron-vlp/build/lib/utils/cython_nms.cpython-36m-x86_64-linux-gnu.so "../detectron-vlp/lib/utils/"
!cp detectron-vlp/build/lib/utils/cython_bbox.cpython-36m-x86_64-linux-gnu.so "../detectron-vlp/lib/utils/"
!cp /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_detectron_ops_gpu.so  "../detectron-vlp/lib/"

6b: ADD DETECTRON-VLP TO PATH

import sys
sys.path.append('/content/detectron-vlp)
sys.path.append("/content/detectron-vlp/lib')

The remaining install process steps stay the same. I also modified various files for python3 as I said earlier.

When calling the flickr.sh, I get the same error as before:

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
/content/detectron-vlp/lib/core/config.py:1207: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = AttrDict(yaml.load(f))
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
Found Detectron ops lib: /content/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so
WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
ERROR model_builder.py: 165: Failed to find function: b''
Traceback (most recent call last):
  File "tools/extract_features.py", line 283, in <module>
    main(args)
  File "tools/extract_features.py", line 226, in main
    model = infer_engine.initialize_model_from_cfg(args.weights)
  File "/content/detectron-vlp/lib/core/test_engine.py", line 351, in initialize_model_from_cfg
    model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 138, in create
    return get_func(model_type_func)(model)
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 89, in generalized_rcnn
    add_roi_mask_head_func=get_func(cfg.MRCNN.ROI_MASK_HEAD),
  File "/content/detectron-vlp/lib/modeling/model_builder.py", line 156, in get_func
    parts = func_name.split('.')
TypeError: a bytes-like object is required, not 'str'

Am I downloading the models properly or is there something I'm missing in the code?

@nikky4D Sorry for the delay. It appears to be a checkpoint loading issue to me, if you look at this line in Traceback: "model = infer_engine.initialize_model_from_cfg(args.weights)". You mentioned "[...] as I'm using colab, and python3, I've been converting detectron-vlp to python3". Make sure you python3 code can still load the *.pkl checkpoint file. I have never seen this type of errors when I use python2 (which the code is based on).

Thanks for very much. That helped narrow down the focus. I figured out that it is a byte to string conversion issue in python3. It happened at several points in the codebase. Working with a python3 fork of old detectron, I was able to solve that issue. Everything works now, I believe.

Just to clarify: The detectron-vlp only extracts the region features for "Grounded Video Description" and not the frame-wise features? Is that correct?

@nikky4D Great to hear that! It would be great if you could fork this repo and include your python3 changes and the colab notebook. After that, please submit a PR which will be reviewed then merged.

Yes, the repo for frame-wise features is here. The caveat is, however, that repo is quite old (the code is based on Caffe and has dependencies on deprecated repos... As an alternative, you can run the GVD model with region features only by setting att_input_mode to region: https://github.com/facebookresearch/grounded-video-description/blob/master/opts.py#L58 There will be some degradation in performance though, as shown in Tab. 8.

I'm back after a long hiatus. Thank you again for everything.

This may be more suited for GVD repo. Please let me know if I should move it.

So I'm at the stage of running inference on my features from a single video file. What would be my input for this? I'm a little confused as to what is expected from all the configs? I'm using this to run gvd:

!python main.py --path_opt cfgs/anet_res101_vg_feat_10x100prop.yml --batch_size 100 --cuda \
    --num_workers 6 --max_epoch 50 --inference_only --start_from save/$ID --id $ID \
    --image_path /content/grounded-video-description/data/anet/anet_detection_vg_fc6_feat_100rois.h5 --att_input_mode region\
    --val_split $val_split --densecap_references $dc_references --densecap_verbose --seq_length 20 \
    | tee log/eval-$val_split-$ID-beam$beam_size-standard-inference

I extracted my features using detectron-vlp as earlier said. My detectron-vlp folder looks like this (after running the region features for VLP and GVD):

I've copied anet_detection_vg_fc6_feat_100rois.h5 to data/anet. I'm not sure what to do now. Can you clarify the inputs a little and what would be the next steps?

After running the code above, I get this error:

dataset = DataLoader(opt, split=opt.train_split, seq_per_img=opt.seq_per_img)
  File "/content/grounded-video-description/misc/dataloader_anet.py", line 134, in __init__
    vid_id, seg_idx = seg_id.split('_segment_')
ValueError: not enough values to unpack (expected 2, got 1)

Is there an example of specifying segments for a single video? Right now, my dic_anet.json looks like this:

...[A lot of other stuff]... "craft", "compete": "compete", "rural": "rural", "yell": "yell", "baked": "baked", "In": "in"}, "videos": [{"vid_id": "vid3", "seg_id": "0", "split": "testing", "id": "vid3_01"}]

Thanks.

@nikky4D You will need to refer to the file skeleton in this data-preparation section, for example for dic_anet.json.

Once you have generated dic_anet.json, you may proceed to the feature extraction like noted here.

@nikky4D It would be great if you can submit a PR to include your notebook once it's finished. Thanks.

I intend to do so. Thank you again for your help.

Excuse me, have you extracted features from a given video successfully? I run into a Runtime Error status == CUBLAS_STATUS_SUCCESS. 13 vs 0.

Thanks for very much. That helped narrow down the focus. I figured out that it is a byte to string conversion issue in python3. It happened at several points in the codebase. Working with a python3 fork of old detectron, I was able to solve that issue. Everything works now, I believe.

Just to clarify: The detectron-vlp only extracts the region features for "Grounded Video Description" and not the frame-wise features? Is that correct?

Hi @nikky4D, I met the same question about: a bytes-like object is required, not 'str'. How do you solve the question? Thank you!

LuoweiZhou / detectron-vlp

colab install for single video run. Missing utils module #5

install environment modules. Most should be available in colab

navigate into detectron folder for install

build and install

Run checks

Check

Uses code from GVD repository for sampling video frames

Set data root as appropriate

install environment modules. Most should be available in colab

navigate into detectron folder for install

build and install

Run checks

Check

Set data root as appropriate