Open sgdgp opened 4 years ago
Hi @sgdgp, some users have reported that they had to train the model themselves instead of using the pretrained models. I still haven't figured out the source of this issue, but it seems like only certain users are affected by this. I'll mention it in the FAQ.
You might have better luck with the Docker setup.
Thanks @MohitShridhar . Also the training is with decoder teacher forcing enabled right ?
Ah no, leave it to the default False
. You can use the settings specified in the training example.
Oh I see. Thanks!
Not sure if this is causing the issue, but check that your versions of torch and torchvision are consistent with requirements.txt
I am trying with the dockerfile. I will update on the status soon.
@sgdgp Have you managed to reproduce the paper's results after all?
@MohitShridhar How did you choose the model that produces the results in your paper? I tried both - best_seen and best_unseen and both perform worse.
@IgorDroz we picked the best_seen model.
You can try training the model yourself (from scratch) if the problem persists.
@MohitShridhar
I trained from scratch and got (valid_seen) SR: 12/820 = 0.015 GC: 172/2109 = 0.082 PLW SR: 0.011 PLW GC: 0.072
while in the paper you achieved : SR: 0.032 GC: 0.1 PLW SR: 0.021 PLW GC: 0.07
The only difference between me and you is the initialization but still you got x2 better results in SR..
Additionally, i wanted to ask you regarding the testing. Is it only done via submission? or since the challenge has finished, will you be able to release a code with the actual GT of the test?
Thanks, Igor
The only difference between me and you is the initialization but still you got x2 better results in SR..
Sorry, what's the initialization difference? And also, is this inside a Docker container?
or since the challenge has finished, will you be able to release a code with the actual GT of the test?
No. The leaderboard is a perpetual benchmark for ALFRED. As with any benchmark in the community, the test set will remain a secret to prevent cheating/overfitting. To evaluate on the test set, use the leaderboard submission.
@MohitShridhar The initialization of the neural net, the initial weights. And no, it is not inside a docker.
@IgorDroz can you report your torch
and torchvision
versions along with CUDA and GPU specs? Also, which resnet
checkpoint are you using from torchvision?
@MohitShridhar torch==1.1.0 torchvision==0.3.0 CUDA Version: 11.1 GPU is Tesla K80 nvidia Driver Version: 455.23.05
How can i check the resnet checkpoint?
@IgorDroz, it's usually inside $HOME/.cache/torch/checkpoints/
. I am using resnet34-333f7ec4.pth
.
@IgorDroz, it's usually inside
$HOME/.cache/torch/checkpoints/
. I am usingresnet34-333f7ec4.pth
.
@MohitShridhar Sorry for the late answer, so probably this is the difference, i use resnet18-5c106cde.pth. Now it makes sense, thanks!
Oops, sorry. I just checked again. I am also using resnet18-5c106cde.pth
, so it's probably not the issue.
The next thing to try would be run this inside docker to make sure the setup is exactly the same.
@MohitShridhar Hi again,
Just saw your answer. yet i am not able to reproduce your results, docker shouldn't really matter as the environment is the same and i should be able to get similar results to yours...
a recap of what i tried and what i got:
I used your pre-trained model (https://github.com/askforalfred/alfred/tree/master/models#pre-trained-model) and ran evaluation. The results are: SR: 8/820 = 0.01 GC: 143/2109 = 0.068 PLW SR: 0.003 PLW GC: 0.038
Which results did you achieve with this model? because they are pretty far from what you have reported in the paper: SR: 0.032 GC: 0.1 PLW SR: 0.021 PLW GC: 0.07
i also trained from scratch and got: SR: 8/820 = 0.01 GC: 143/2109 = 0.068 PLW SR: 0.007 PLW GC: 0.049 (which is quite similar to the results i got using your pretrained model)
this time i used P100 GPU like you, yet the results are different. How can it be? i will attach my packages:
ai2thor==2.1.0 cached-property==1.5.2 certifi==2020.12.5 chardet==4.0.0 click==7.1.2 cycler==0.10.0 decorator==4.4.2 Flask==1.1.2 h5py==3.1.0 idna==2.10 itsdangerous==1.1.0 Jinja2==2.11.2 kiwisolver==1.3.1 MarkupSafe==1.1.1 matplotlib==3.3.3 networkx==2.5 numpy==1.19.5 opencv-python==4.5.1.48 pandas==1.2.0 Pillow==8.1.0 progressbar2==3.53.1 protobuf==3.14.0 pyparsing==2.4.7 python-dateutil==2.8.1 python-utils==2.4.0 pytz==2020.5 PyYAML==5.3.1 requests==2.25.1 revtok==0.0.3 six==1.15.0 tensorboardX==1.8 torch==1.1.0 torchvision==0.3.0 tqdm==4.56.0 urllib3==1.26.2 vocab==0.0.5 Werkzeug==1.0.1
@IgorDroz Docker is a way to ensure that the setup is completely identical (like CUDA, torch, torchvision etc).
Check out this work, and their reproduced results. Their models are also substantially better than the baselines reported in the ALFRED paper.
I am not sure what else could be causing this issue. Sorry.
@MohitShridhar I will definitely check their work out, thanks! i noticed that there is another work with even better results on the leaderboard, do you have their paper by any chance?
@IgorDroz I don't think the leaderboard topper has made their paper/code publicly available. It's probably a recent submission (or to be submitted), so you'd have to wait for the anonymity period to end.
@MohitShridhar okay, thanks a lot!
Cannot reproduce results either using the pre-trained best-seen model (and resnet18-5c106cde.pt). I'm on torch==1.9.0 (py3.7_cuda10.2_cudnn7.6.5_0), results look similar to the ones posted above by other users.
SR: 8/820 = 0.010 GC: 142/2109 = 0.067 PLW SR: 0.003 PLW GC: 0.038
Was anyone able to reproduce the results at all? Just aking.
Hi, Thanks for the amazing dataset and for sharing your code. I am unable to reproduce the results for validation set seen. I downloaded the checkpoints as provided by you and I am using the best_seen.pth I am getting SR 0.0097 and GC 0.0659 whereas the result on val seen in the paper is SR 0.037 and GC 0.1.
Could you point to any stuff I might have missed ?
For starting XServer I used
sudo nvidia-xconfig -a --use-display-device=None --virtual=1024x786
sudo /usr/bin/X :0 &
I face two warnings
UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. warnings.warn("Default upsampling behavior when mode={} is changed "
UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
The second warning won't affect the results but I wanted to confirm if upsampling with align corners was intended or whether the warning appeared earlier too and I should ignore it ?