about voken regression and voken constrastive

airsplay / vokenization

PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"

MIT License

186 stars 22 forks source link

I have two questions.

(1) I notice that in your code https://github.com/airsplay/vokenization/blob/5601b799184ed54414872565f233e22c76f5f6f0/vlm/model.py#L238 , you design three loss function voken classification, voken regression and voken constrastive. But you only report "voken classification" in paper, maybe you find "voken regression and voken constrastive" both don't work or even harm model performance after trials? Is my guess correct ? (Because image features are far different from language embeddings. )

(2) What's the intuition that voken classification loss can improve model performance ? I suspect that different words with similar semantic will have same voken labels and voken classification loss will optimize their similarity. What is your opinion？Could you give me some intuition from your views?

airsplay / vokenization

about voken regression and voken constrastive #11