cshizhe / hgr_v2t

Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".
MIT License
209 stars 21 forks source link

About training time #3

Closed LetsGoFir closed 4 years ago

LetsGoFir commented 4 years ago

Thanks for your great work!

I have a question that how long to train your model on such 3 dataset?

And the BaiduNetdisk is empty.

mengliu1991 commented 4 years ago

@cshizhe Thanks you for your great work. But the BaiduNetDisk is empty. Could you upload these data?

cshizhe commented 4 years ago

Thanks for your great work!

I have a question that how long to train your model on such 3 dataset?

And the BaiduNetdisk is empty.

  1. I am currently copying datasets from remote server to my local machine so that I can upload them to BaiduNetdisk. It can take quite some time. Sorry about that.

  2. Training time for each dataset: MSRVTT: 120,595 video-caption pairs, frames/video=20, 7 min per epoch TGIF: 80,295 video-caption pairs, frames/video=6, 3 min per epoch VATEX: 259,910 video-caption pairs, frames/video=10, 10 min per epoch

LetsGoFir commented 4 years ago

Thanks for your great work! I have a question that how long to train your model on such 3 dataset? And the BaiduNetdisk is empty.

  1. I am currently copying datasets from remote server to my local machine so that I can upload them to BaiduNetdisk. It can take quite some time. Sorry about that.
  2. Training time for each dataset: MSRVTT: 120,595 video-caption pairs, frames/video=20, 7 min per epoch TGIF: 80,295 video-caption pairs, frames/video=6, 3 min per epoch VATEX: 259,910 video-caption pairs, frames/video=10, 10 min per epoch

Thanks for your prompt reply! How many GPUs do you use?

cshizhe commented 4 years ago

Thanks for your great work! I have a question that how long to train your model on such 3 dataset? And the BaiduNetdisk is empty.

  1. I am currently copying datasets from remote server to my local machine so that I can upload them to BaiduNetdisk. It can take quite some time. Sorry about that.
  2. Training time for each dataset: MSRVTT: 120,595 video-caption pairs, frames/video=20, 7 min per epoch TGIF: 80,295 video-caption pairs, frames/video=6, 3 min per epoch VATEX: 259,910 video-caption pairs, frames/video=10, 10 min per epoch

Thanks for your prompt reply! How many GPUs do you use?

Single GPU is used.