TengdaHan / CoCLR

[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.
Apache License 2.0
286 stars 32 forks source link

information about requirements and dateset preparations #40

Closed fmthoker closed 3 years ago

fmthoker commented 3 years ago

Hi, Thanks for releasing the code. Can you detail the package requirements like PyTorch, torchvision versions etc that were used to obtain the results? Also, can you give some more information about how to set up datasets specifically KInetics-400 for pretraining? Do we need to compute optical flow separately and then create lmdb files for both RGB and flow?

TengdaHan commented 3 years ago

Hi, some important envs are:

- pytorch=1.4.0=py3.7_cuda10.0.130_cudnn7.6.3_0
- torchvision==0.5.0a0+681c6c1
- msgpack==0.6.2
- py-opencv=3.4.2=py37hb342d67_1
- opencv=3.4.2=py37h6fd60c2_1
- libopencv=3.4.2=hb342d67_1
- pillow==6.1.0

Full list is here: https://github.com/TengdaHan/CoCLR/blob/main/environment_pt14.yml

For Kinetics,

"Do we need to compute optical flow separately" -- Yes. I used the code here: https://github.com/TengdaHan/MemDPC/blob/master/process_data/src/extract_ff.py to extract TVL1 optical flow.

"then create lmdb files for both RGB and flow" -- This is not necessary. As long as you can load the image data and feed it to the model (but you need to write your own dataloader). The lmdb package is nothing special, it just zips small files together.