Closed zycleo closed 1 year ago
Hi, I plan to reproduce this work, but it needs the pretrained clip4clip model, I visit the repo of clip4clip but find nobody share the pretrained model. Do you have some ideas?
Hi, sorry for late reply. for @zycleo, the memory size required to extract the feature is quite large. As mentioned in the paper, the PC that we use to extract the feature and train the model has 4 x A6000 GPU with 377 GB RAM. For MSRVTT, the final STGraph feature is around 270 GB.
For @Jack2Lu, the pretrained CLIP4Clip model is not provided by the author. However, you can just follow their github code to fine tune CLIP model and get the feature.
Hello,
When i create the object based action graph: Run action_spatio_temporal_graph_feature_extractor.ipynb. I found that the system reports an error: IOStream.flush timed out.
Dataset = 'msrvtt'.
How much storage is the generated grid_based_spatial_action_graph.hdf5 file?
Can you tell me more about your device configuration?