TheShadow29 / VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
https://vidsitu.org/
MIT License
57 stars 8 forks source link

hi~ about gpu and others #10

Closed jun0wanan closed 3 years ago

jun0wanan commented 3 years ago

hi,

Thank you for your excellent work ! Approximately how many GPUs are needed for the experiment of the whole data set?How big is the data set including features~~(●'◡'●)

Best, jun

TheShadow29 commented 3 years ago

The verb prediction task involves training the video backbone which would require 20 hours on 4 gpus.

The semantic role prediction and event role prediction can be done on 1 gpu in around 6 hours.

jun0wanan commented 3 years ago

The verb prediction task involves training the video backbone which would require 20 hours on 4 gpus.

The semantic role prediction and event role prediction can be done on 1 gpu in around 6 hours.

wow~ What about feat size?