PhoebusSi / SAR

Code for our ACL2021 paper: "Check It Again: Progressive Visual Question Answering via Visual Entailment"
31 stars 6 forks source link

Question about word embedding #1

Closed harukaza closed 3 years ago

harukaza commented 3 years ago

I confuse about the embedding in your paper. LXMERT separately encodes image and caption text in two streams in paper 3.2.3. 1. The processed caption are word or word embedding?

  1. L = LV E + Lssl when change SSL to LMH, the loss of ssl may be the loss of LMH?
PhoebusSi commented 3 years ago

A1: A sequence of word ids. The caption is processed by the tokenizer of Lxmert( which is identical to BERT tokenizer). This tokenzer can transfer the caption into a sequence of word ids which is (a part of) the input of LXMERT. You can refer https://huggingface.co/transformers/model_doc/lxmert.html?highlight=lxmert for more implementation detail of LXMERT. A2 : Yes.