airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
935 stars 158 forks source link

The Hyper parameters for the VizWiz datasets #58

Open runzeer opened 4 years ago

runzeer commented 4 years ago

Dear Pro: I read about the Vizwiz Leaderboard for ECCV 2018. The results shown are 55.40 for no model ensemble. But I trained the Vizwiz datasets and the results are only 51.96. So I want to know how the results different. The answer wocabulary for the Vizwiz dataset are chosen according to the most common 3000 categories. The initial lr rate is 5e-5, epochs are 4 and batchsize is 32. The pretraining model I used is the Epoch20_LXRT.pth. So if convenient, could you share your Hyper parameters for the Vizwiz datasets?

airsplay commented 4 years ago

Could you try this one that I used to submit the leaderboard entry?

BatchSize 64,
LR 1e-4,
Epochs 20 (Vizwiz is super small... 
        One epoch takes around 10 mins while VQA takes 1.5 hours, 
        we thus increase the number of epochs)
runzeer commented 4 years ago

OK! I would try it soon! Thanks a lot! But I still have 2 questions for the training. Looking forward to your reply.

  1. How do you deal with the answer labels? You know,every question has 10 answers. But it has no score for different answers like VQA. So how do you deal with the answer labels?
  2. The loss function issue. I choose Soft loss function used in https://github.com/DenisDsh/VizWiz-VQA-PyTorch/blob/master/train.py . But I do not know how you choose the loss function. Still the Crossentropy?
airsplay commented 4 years ago

Thanks. I have uploaded the materials here: http://nlp.cs.unc.edu/data/lxmert_data/vizwiz/vizwiz.zip. You could kindly take a look.

For the loss function, I just used CrossEntropy as VQA/GQA.

runzeer commented 4 years ago

Sorry to trouble you again.. When I use the materials above, there exists the KeyError: target[self.raw_dataset.ans2label[ans]] = score KeyError: '1 package stouffer signature classics fettuccini alfredo' But I do not find the solution because the key is in the dict. So could you help me find this?

airsplay commented 4 years ago

I think that I just remove the answer if it is not in the dict.

runzeer commented 4 years ago

OK!I found it! Thanks a lot!!

runzeer commented 4 years ago

I checked the test file and found the test files have been changed. And I wanted to use your docker but the pretrained model link below is out-of-date. https://www.dropbox.com/s/nu6jwhc88ujbw1v/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1

So could you use your model to generate the new test data? Thanks a lot!

airsplay commented 4 years ago

The new dropbox link of the model is updated on bottom-up-attention repo and is available [here](alternative pretrained model).

runzeer commented 4 years ago

OK! Thanks a lot!! I wonder how you change the answers to the labels, especially adding the label confidence.

airsplay commented 4 years ago

This part is almost the same as the previous VQA pre-processing. You could read this repo for details.