fpv-iplab / rulstm

Code for the Paper: Antonino Furnari and Giovanni Maria Farinella. What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019.
http://iplab.dmi.unict.it/rulstm
132 stars 33 forks source link

Some questions about the features #3

Closed naykun closed 5 years ago

naykun commented 5 years ago

Thanks for your great work! I'm trying to leverage your method to another dataset, so I need to generate those three types of features for my own. Here are my questions:

  1. Which layer of TSN model features are you particularly using? I cannot find such info from your paper or document.
  2. I think there is a gap between the feature extracting models' raw output and your rulstm model's inputs. Could you share any sample code for TSN and faster-RCNN feature extracting?
antoninofurnari commented 5 years ago

Hello, thank you for your words of appreciation. Yes, I can share code for those feature extraction step, but I'll need a few days as I'm traveling now.

For the moment:

  1. Concerning TSN features: we use the 1024-dimensional features right after global average pooling in BNInception. That is, we get rid of the last FC layer which computes class scores.
  2. Concerning Faster R-CNN: we use it to detect bounding boxes, then for each frame we discard the bounding box coordinates and cumulate detection scores for each of the objects, thus obtaining a 352-dimensional representation. For instance, if the detector has detected 3 objects of class 8 with scores 0.1, 0.6 and 0.2, the 8th unit of the representation will contain the number 0.9.

I'll share code snippets to obtain such results soon.

naykun commented 5 years ago

Much appreciated!

antoninofurnari commented 5 years ago

I finally added a few example scripts on feature extraction in https://github.com/fpv-iplab/rulstm/commit/cbdb3dd47ed0ef04186d7911b97f2adbd9542063. Hope this helps :)

naykun commented 5 years ago

I finally added a few example scripts on feature extraction in cbdb3dd. Hope this helps :)

Thanks a lot!

tianyu-su commented 4 years ago

Hello, thank you for your words of appreciation. Yes, I can share code for those feature extraction step, but I'll need a few days as I'm traveling now.

For the moment:

  1. Concerning TSN features: we use the 1024-dimensional features right after global average pooling in BNInception. That is, we get rid of the last FC layer which computes class scores.
  2. Concerning Faster R-CNN: we use it to detect bounding boxes, then for each frame we discard the bounding box coordinates and cumulate detection scores for each of the objects, thus obtaining a 352-dimensional representation. For instance, if the detector has detected 3 objects of class 8 with scores 0.1, 0.6 and 0.2, the 8th unit of the representation will contain the number 0.9.

I'll share code snippets to obtain such results soon.

Hello, I want to know how to train TSN on this problem to get "TSN-flow.pth.tar" and "TSN-rgb.pth.tar". Because in your extract code, only forward processing can be found to get special features. Can you tell me the training process, such as input and output? Thanks a lot!

antoninofurnari commented 4 years ago

Hello, We trained TSN using the PyTorch implementation provided by the authors, which can be found here: https://github.com/yjxiong/tsn-pytorch.

I see the authors released a new toolbox here https://github.com/open-mmlab/mmaction and suggest to switch to it. However, I'm not confindent with the latter and I don't know if its output is compatible with the rest of the code here.

Using the code in https://github.com/yjxiong/tsn-pytorch to train on your own dataset should require to format the data as suggested by the authors (maybe have a look at the original Caffe implementation here https://github.com/yjxiong/temporal-segment-networks).

After training, you should get the checkpoints for the RGB and Flow branches, which you should be able to use to extract features.

Best, Antonino

tianyu-su commented 4 years ago

OK! I have a try. I really appreciate you.

tianyu-su commented 4 years ago

Hello, I have a question on running code in "Faster Rcnn -> detect_video.py". I found cv2.VideoCapture didn't work very well. Usually, it is less than a number of total frames. Do you find this problem in your experiment?

antoninofurnari commented 4 years ago

Hello, I suppose this depends on the format you are using. In my case, I dont' recall this happening. However, I had previously re-encoded all videos at a fixed framerate of 30fps using ffmpeg. I used this command:

ffmpeg -i input.mp4 -c:v libx264 -crf 22 -r 30 -vsync cfr -an output.mp4

Could you try that and see if this solves the issue?

Otherwise, you could extract all frames to jpgs and hack the detect_video.py file to read frames from jpeg files.

Antonino

tianyu-su commented 4 years ago

Cool! Perfect solution by running ffmpeg command. Thanks s a lot!

tianyu-su commented 4 years ago

Hello, I have a confusion on how to get a subset of frames at about 4fps mentioned in READEME. Since you have converted all videos to fix framerate of 30fps, I don't know how to resample frames to get a subset. Please tell me some details, thank you!

antoninofurnari commented 4 years ago

Hello, Depending on the parameters used to perform training/testing, the model will look only for a subset of the images. To avoid people to download the full set of features, we provided only a subset of it. This has been done by only copying the needed features and skipping the other ones, but it is not a required step to make the system work.

In general, it is sufficient to extract all features from the dataset and skip the subset part. This will also provide more flexibility to tune the parameters (encoding/anticipation steps).

Antonino

tianyu-su commented 4 years ago

OK! As you mean, actually you extract each frame features offline, however, you only storage the needed features into mdb file to adapt the datasets building in dataset.py?

antoninofurnari commented 4 years ago

That's correct: I just stored the needed features and I did not modify dataset.py.

tianyu-su commented 4 years ago

OK! Thanks s a lot!

agupt013 commented 4 years ago

Hi, Can you please tell me what is the configuration you are using for TSN? I'm using model from https://github.com/yjxiong/tsn-pytorch with use num_class = 2513, mode='RGB', num_segment = 1 and base_model='BNInception'. I'm able to load the pretrained weights provided by you using the mentioned config, however, I get the following error.

RuntimeError: size mismatch, m1: [1 x 16384], m2: [1024 x 2513] at /opt/conda/conda-bld/pytorch_1573049301898/work/aten/src/TH/generic/THTensorMath.cpp:197

Thanks!

Note: I had to change the input size to 224 x224 and it resolve the issue. Is it correct?

antoninofurnari commented 4 years ago

Hi,

While we started from the PyTorch TSN representation you mentioned, we ended up modifying it heavily. Hence, the checkpoints might not be 100% compatible. One of the changes has been the adoption of an updated model definition for the BNInception backbone from the pretrainedmodels python package (https://github.com/Cadene/pretrained-models.pytorch).

You can find an example of how the model can be used for feature extraction here: https://github.com/fpv-iplab/rulstm/blob/master/FEATEXT/extract_example_rgb.py

You should be able to load the provided checkpoint if you define the backbone as shown in the example above.

Best, Antonino

agupt013 commented 4 years ago

Thank you! It works.