facebookresearch / VMZ

VMZ: Model Zoo for Video Modeling
Apache License 2.0
1.04k stars 156 forks source link

Can we use the features extracted by the R(2+1)D-34 caffe model as input for a classifier built in Keras? #86

Open ThatIndianCoder opened 4 years ago

ThatIndianCoder commented 4 years ago

For one of my projects, I would like to build a simple feed forward neural network in Keras. I was wondering if it is possible to use the features extracted by the given script written in caffe2 as input for a keras model.

bjuncek commented 4 years ago

I'm sure it is, you'd simply need to write the dataloader that takes those features in for Keras - unfortunately, I can't really help with that :(


I've done something similar for pytorch, and it worked

ThatIndianCoder commented 4 years ago

Thank you for the tip. I will look into it.

rsomani95 commented 4 years ago

I've done something similar for pytorch, and it worked

@bjuncek do you intend on sharing the code you wrote for doing this in PyTorch? I think a lot of people would find that very helpful.

bjuncek commented 4 years ago

Thanks for your interest @rsomani95, @daniel-j-h wrote a nifty little script here

Hey folks we just did that for the r(2+1)d 34-layer model here: https://github.com/moabitcoin/ig65m-pytorch

I'll make my repo public as well tonight, as it might have a bit more context for other models :)

Best, Bruno

rsomani95 commented 4 years ago

@bjuncek this repo might be exactly what I'm looking for! Thank you for the reference :)

To summarise:

  1. VMZ defined and trained these models in Caffe2, exported and shared them as .pkl files, but to use them via PyTorch, you need to define the same model in PyTorch and copy over the weights layer by layer from the .pkl file -- and that's what https://github.com/moabitcoin/ig65m-pytorch does.
  2. In order to train on a custom dataset, we could create a torch Dataset using i)https://github.com/irhum/R2Plus1D-PyTorch/blob/master/dataset.py or ii)https://github.com/MohsenFayyaz89/PyTorch_Video_Dataset/blob/master/videoDataset.py and fine-tune the loaded model using this.

Is my understanding correct?

Thanks for your help. On a related note, your work on the Salient Clip Sampler is super neat.

daniel-j-h commented 4 years ago

The .pkl files are the (key, value) blobs for the trained Caffe2 model weights.

In the convert.py tool in ig65m-pytorch we manually create the matching architecture in PyTorch (check convert source for slight model differences and how we fix them), and then copy over the named blobs into the appropriate PyTorch modules. We double check for tensor sizes, dtype matches, and that we have a 1:1 mapping between blobs and PyTorch model - with enough care to details this works!

We also provide an extract.py script showing how to use model and weights on your custom dataset and e.g. on your webcam. We'll work on more convenience tools in the coming weeks.

rsomani95 commented 4 years ago

@daniel-j-h thanks for the clarification, I missed the extract.py script. Excellent repo!

bjuncek commented 4 years ago

@rsomani95 yeah, that's pretty much the idea. I'd also take a look into the torchvision datasets, as I've tested it with those and can't really guarantee performance for other dataloaders (unfortunately video training can be quite data-dependent). In addition, I found them to be the simplest to set up as well.

ref: https://github.com/pytorch/vision/blob/master/torchvision/datasets/kinetics.py

and thanks :)