Open iki-taichi opened 3 years ago
Hi Iki-san,
Thanks for pointing this out, you are right.
I don't think this impacts performance much but I'll try to fix it. I cannot however just fix the code as it will affect the controlled VL-BERT model that we released.
So, I'll need to find some time and resources to pre-train the controlled VL-BERT again.
I'll keep this issue open until I do so.
If you are pre-training VL-BERT, go ahead and fix the indexing problem :)
Thank you for your kind answer. I agree with you. Although I'm curious about its impact, considering the cost of the pre-training, I do not think it is urgent to fix it.
As for me, I'm not able to do the pre-training due to lack of resources :_(
Hello.
I have a question about the VLBertEmbeddings class.
In its forward function, a global image feature is added into linguistic tokens The last token in vision sequence is used as the global image feature like bellow:
https://github.com/e-bug/volta/blob/9e5202141920600d58a9c5c17519ca453795d65d/volta/embeddings.py#L271
Using the last token seems reasonable for the original VLBert (vl-bert_base.json) because add_global_imgfeat is last, but I think this should be the first token for the controlled VLBert (ctrl_vl-bert_base.json), whose add_global_imgfeat is first.
Are there any reason that the last token is always used in the class?
I'm sorry if I misunderstand the way the embeddings classes work.
Thanks.