Sejong-VLI / V2T-Action-Graph-JKSUCIS-2023

The implementation of a paper entitled "Action Knowledge for Video Captioning with Graph Neural Networks" (JKSUCIS 2023).
MIT License
14 stars 3 forks source link

IOStream.flush timed out #1

Closed zycleo closed 1 year ago

zycleo commented 1 year ago

Hello,

When i create the object based action graph: Run action_spatio_temporal_graph_feature_extractor.ipynb. I found that the system reports an error: IOStream.flush timed out.

Dataset = 'msrvtt'.

How much storage is the generated grid_based_spatial_action_graph.hdf5 file?

Can you tell me more about your device configuration?

Jack2Lu commented 1 year ago

Hi, I plan to reproduce this work, but it needs the pretrained clip4clip model, I visit the repo of clip4clip but find nobody share the pretrained model. Do you have some ideas?

fadzaka12 commented 1 year ago

Hi, sorry for late reply. for @zycleo, the memory size required to extract the feature is quite large. As mentioned in the paper, the PC that we use to extract the feature and train the model has 4 x A6000 GPU with 377 GB RAM. For MSRVTT, the final STGraph feature is around 270 GB.

For @Jack2Lu, the pretrained CLIP4Clip model is not provided by the author. However, you can just follow their github code to fine tune CLIP model and get the feature.