fpv-iplab / rulstm

Code for the Paper: Antonino Furnari and Giovanni Maria Farinella. What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019.
http://iplab.dmi.unict.it/rulstm
133 stars 33 forks source link

Hello, how to use your own data set for training to do prediction tasks? How to process your own data set #24

Closed 1848739617 closed 2 years ago

antoninofurnari commented 2 years ago

Hello,

You can refer to our papers (http://arxiv.org/pdf/2005.02190.pdf and https://arxiv.org/pdf/1905.09035.pdf) for details on the whole training & evaluation process.

In short:

  1. Train a TSN model (other, more modern, backbones should work as well) for action recognition on your dataset;
  2. Use the TSN model to extract features from the video frames;
  3. Train & test RULSTM on top of the features using the code provided in this repo.
  4. If you also want to include an optical flow branch, you should extract optical flow from your frames and train a TSN model on top of that as well.
  5. Similarly, for object-based features, you should train an object detector and then use it to extract bag of word-like object features.

Please consider that this code covers only steps 2 and 3.

For step 1, we used this implementation: https://github.com/yjxiong/tsn-pytorch

For step 2, you can use this code for optical flow extraction: https://github.com/feichtenhofer/gpu_flow

For step 3, you can use detectron2: https://github.com/facebookresearch/detectron2

Best, Antonino

Bra1nsen commented 2 years ago

Is it possible to get an Future-image? E.g. How would the Image look like in 5s:

Pixelbalance(t0=0)

Optical Flow Estimation

Pixelbalance(t_future=t0+t=5)

antoninofurnari commented 2 years ago

Hello,

Our method does not predict future images, but only makes semantic predictions on the actions which may take place in the future. Please refer to our paper for more information: http://arxiv.org/pdf/2005.02190.pdf

Best, Antonino

Bra1nsen commented 2 years ago

hey antonio, can you recommend me some sources for predicting future frames based on optical flow? Screenshot 2022-11-13 231938

I use Sky Imager, trying to predict solar radiation. working for solar technology.. SKYCAM  - Kopie (2)

antoninofurnari commented 2 years ago

Hi, not sure I fully understand what's your data/task/goal, but maybe this review on video prediction can be a good point to start: https://arxiv.org/pdf/2004.05214.pdf

Antonino

1848739617 commented 2 years ago

Hello, I would like to ask, I want to use your model for the assembly line operation data set on the worker's workshop to predict the next action of the worker. Do you think this idea is okay? And how should I label my dataset?

antoninofurnari commented 2 years ago

Hello,

Yes, I think you could use this code for that. Please refer to our papers for more information. You can find more information on the correct format format for labels here: https://github.com/fpv-iplab/rulstm/tree/master/RULSTM/data/ek55