IntelLabs / GraVi-T

Graph learning framework for long-term video understanding
Apache License 2.0
48 stars 5 forks source link

Pretrained weights #4

Open BStephen99 opened 3 months ago

BStephen99 commented 3 months ago

Hi there,

Thank you for this excellent work!!!

Would it be possible for you to share the pre-trained weights for the AVA active speaker model?

It would be much appreciated. :-)

kylemin commented 3 months ago

Due to Intel's policy, we do not have an immediate plan to share the pre-trained weights. We ask you to train the model on your end since the whole training process takes less than a few hours.

Thank you, Kyle

BStephen99 commented 3 months ago

Thanks for the response. Seems a bit silly, since they can be recreated. In any case, thanks again for making this repository available. 😁

BStephen99 commented 3 months ago

Sorry to bother you again, but I'm having trouble recreating the training features in RESNET18-TSM-AUG. Using the active-speakers-context repository, I swap the existing model for your models_stage1_tsm.py model, use the pretrained weights and I execute STE_forward.py with number of frames=11. The resulting features are not however the same as yours. Did you use all default parameters? The only other change I made was to reshape the video_data so it would be compatible with your model. (The input video data shape is (1, 11, 3, 144, 144) and the audio is (1,1,13,40). Does that seem correct?)