-
In the GQA data loader, the dimension of the spatial feature is set to 4 for LXMERT. So, when I trained BUTD on GQA, the dimension was insufficient ( 6 dim is required ).
```
spatials = torch.from_n…
-
HI,
I have few more queries. Can you please help in understanding them..
1) In the code: model_ensemble_ravi_E_3 is mentioned [config\GamePlay\ensemble], in drive there is a name: model_ensemble_3ta…
-
# 🚀 Feature request
Thanks a lot for releasing LXMERT model. In the LXMERT model code samples, the visual feature extraction code (using generalized faster-rcnn: [modeling_frcnn](https://github.com…
-
Use UPDN for the first 12 epochs of SSL and then Self-supervised. Why did you use Self-supervised from the beginning (in train.py)
-
Hello, airsplay!
Thanks for your generous sharing of this outstanding work!
However, the URLs provided in this repository seem not accessible anymore. Could you please update them?
Thank you…
-
https://arxiv.org/pdf/1810.04805.pdf
温故知新...
-
Hi, I just noticed that in the case of [https://paperswithcode.com/sota/visual-question-answering-on-gqa-test2019](https://paperswithcode.com/sota/visual-question-answering-on-gqa-test2019)
The age…
-
I confuse about the embedding in your paper. _LXMERT separately encodes image and caption text in two streams_ in paper 3.2.3. 1. The processed caption are word or word embedding?
2. _L = LV E + Lss…
-
Hello.
I have a pretrained two LXMERT models, using the official LXMERT GitHub repository.
I want to evaluate my models on RefCOCO.
I was wondering if it is possible to use your implementation to …
-
# ❓ Questions & Help
Hello, congrats to all contributors for the awesome work with LXMERT! It is exciting to see multimodal transformers coming to hugginface/transformers. Of course, I immediately …
LetiP updated
3 years ago