airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
935 stars 158 forks source link

Bad performance on NLVR2 #1

Closed sanjayss34 closed 5 years ago

sanjayss34 commented 5 years ago

Hi, thanks for releasing your code! I'm not able to reproduce your fine-tuning result on NLVR2. I followed your instructions by downloading the pre-trained model, downloading the image features, pre-processing the nlvr2 JSON files, and running the nlvr2_finetune.bash script as is. However, I get the following results, which are much lower than the result you reported. Do you know why this might be happening?

Epoch 0: Train 52.32 Epoch 0: Valid 50.86 Epoch 0: Best 50.86

Epoch 1: Train 50.50 Epoch 1: Valid 49.14 Epoch 1: Best 50.86

Epoch 2: Train 50.56 Epoch 2: Valid 49.31 Epoch 2: Best 50.86

Epoch 3: Train 54.83 Epoch 3: Valid 51.65 Epoch 3: Best 51.65

airsplay commented 5 years ago

Many thanks for running the experiment and point this issue out!!!

I am now running a verified experiment and would let you the result tomorrow morning.

PyTorch Version

My initial guess is the PyTorch version. Could you help to try torch==1.0.1? Installation command:

pip install --force torch==1.0.1

I found that I used an old virtualenv with an old PyTorch version. I was supposing that PyTorch should be backward-compatible in computing gradients but it seems not the case.

By the way, here is a full list of my virtualenv. I believe that the only difference might be the torch version.

Package         Version  
--------------- ---------
boto3           1.9.205  
botocore        1.12.205 
certifi         2019.6.16
chardet         3.0.4    
docutils        0.14     
idna            2.8      
jmespath        0.9.4    
numpy           1.17.0   
pip             19.2.1   
python-dateutil 2.8.0    
requests        2.22.0   
s3transfer      0.2.1    
setuptools      41.0.1   
six             1.12.0   
torch           1.0.1    
tqdm            4.33.0   
urllib3         1.25.3   
wheel           0.33.4 

If so, it's really strange but I will update requirement.txt first.

Raw Feature

And could you also try to use the raw feature from our server in replace of the feature from zip files with the command:

wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/train_obj36.tsv -P data/nlvr2_imgfeat
wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/valid_obj36.tsv -P data/nlvr2_imgfeat

In case there are some broken zip files.

airsplay commented 5 years ago

Hi, I got the result of accuracy 74.39% (within the range 74.0% to 74.5% in README) with the same command in README last night.

Here is a snapshot of the results: image

I would recommend still trying torch==1.0.1 if possible.

sanjayss34 commented 5 years ago

Thanks for the update! Yes, it does look like the torch version was the issue. (So far I have re-trained for 1 epoch using pytorch 1.0.1 and got a validation accuracy of 67.86.) Previously, I was using version 1.1.0.