Closed sanjayss34 closed 5 years ago
Many thanks for running the experiment and point this issue out!!!
I am now running a verified experiment and would let you the result tomorrow morning.
My initial guess is the PyTorch version. Could you help to try torch==1.0.1? Installation command:
pip install --force torch==1.0.1
I found that I used an old virtualenv with an old PyTorch version. I was supposing that PyTorch should be backward-compatible in computing gradients but it seems not the case.
By the way, here is a full list of my virtualenv. I believe that the only difference might be the torch version.
Package Version
--------------- ---------
boto3 1.9.205
botocore 1.12.205
certifi 2019.6.16
chardet 3.0.4
docutils 0.14
idna 2.8
jmespath 0.9.4
numpy 1.17.0
pip 19.2.1
python-dateutil 2.8.0
requests 2.22.0
s3transfer 0.2.1
setuptools 41.0.1
six 1.12.0
torch 1.0.1
tqdm 4.33.0
urllib3 1.25.3
wheel 0.33.4
If so, it's really strange but I will update requirement.txt
first.
And could you also try to use the raw feature from our server in replace of the feature from zip files with the command:
wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/train_obj36.tsv -P data/nlvr2_imgfeat
wget nlp.cs.unc.edu/data/lxmert_data/nlvr2_imgfeat/valid_obj36.tsv -P data/nlvr2_imgfeat
In case there are some broken zip files.
Hi, I got the result of accuracy 74.39% (within the range 74.0% to 74.5% in README) with the same command in README last night.
Here is a snapshot of the results:
I would recommend still trying torch==1.0.1 if possible.
Thanks for the update! Yes, it does look like the torch version was the issue. (So far I have re-trained for 1 epoch using pytorch 1.0.1 and got a validation accuracy of 67.86.) Previously, I was using version 1.1.0.
Hi, thanks for releasing your code! I'm not able to reproduce your fine-tuning result on NLVR2. I followed your instructions by downloading the pre-trained model, downloading the image features, pre-processing the nlvr2 JSON files, and running the nlvr2_finetune.bash script as is. However, I get the following results, which are much lower than the result you reported. Do you know why this might be happening?
Epoch 0: Train 52.32 Epoch 0: Valid 50.86 Epoch 0: Best 50.86
Epoch 1: Train 50.50 Epoch 1: Valid 49.14 Epoch 1: Best 50.86
Epoch 2: Train 50.56 Epoch 2: Valid 49.31 Epoch 2: Best 50.86
Epoch 3: Train 54.83 Epoch 3: Valid 51.65 Epoch 3: Best 51.65