bbrattoli / ZeroShotVideoClassification

Zero-shot video classification by end-to-end training of 3D convolutional neural networks
Apache License 2.0
145 stars 24 forks source link

did you ever try same training using BERT or similar model instead of simple Word2Vec #7

Closed pritamqu closed 1 year ago

pritamqu commented 1 year ago

Hi, I follow your work and this is a great work, very simple and effective :) I am wondering did you try or know of similar training with Bert or a similar transformer model; I am trying something like that, but the loss seems to remain fairly steady, and the model is not learning anything. The same framework is working fine with word2vec, Do you know why this may happen? any intuitive thought? @bbrattoli

bbrattoli commented 1 year ago

Hi! Sorry for the late reply, and thanks for your interest in my work 🙂 The beauty of word2vec is that it is straightforward: 1 word 1 encoding, no long sentences. Maybe Bert is too complex for this model. I personally only used Word2vec to stay compatible with previous work and because improving the language model was not the scope of my work, but improving the computer vision model while keeping everything as previous protocol However, I saw that follow-up work is actually using more complex language models, unfortunately, I cannot find the paper right now.

pritamqu commented 1 year ago

Thanks for your comment @bbrattoli, I tried both using word2vec and better models like BERT, and it seems word2vec works better.