askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
362 stars 78 forks source link

From which weight can I reproduce experimental results in the paper among three weights? #15

Closed bhkim94 closed 4 years ago

bhkim94 commented 4 years ago

First of all, thanks for sharing the great code.

  1. Which weight should I use for validation and test dataset? I'm trying to reproduce numbers in the paper. The model is trained with alfred/models/README.md. After training, I've encountered three weights,

    • best_seen.pth
    • best_unseen.pth
    • latest.pth
  2. Which max_step value should I use? The paper said that max_step is set to 400, but README said it is set to 1000. I'm not sure which one is for paper.

MohitShridhar commented 4 years ago
  1. best_seen.pth gives the best performance. We will include a note for this. Thanks!
  2. We will update the paper in the next revision. Thanks!
bhkim94 commented 4 years ago

Huge thanks for replying!

I ran your codes for both max_step (400, 1000) with default settings in the validation dataset, and I got higher numbers than ones reported in the paper.

Is there any difference between experimental settings in this code and the paper?

Numbers below are from max_step == 400, and I got similar results with max_step == 1000.

(numbers reported in the paper / numbers from the codes)

(Seen) SR (PLW S): 3.8 (2.2) / 4.0 (2.1) (Seen) PC (PLW PC): 10.9 (6.9) / 10.5 (7.2) (Seen) Sub-Goals

(Unseen) SR (PLW S): 0.1 (0.0) / 0.2 (0.1) (Unseen) PC (PLW PC): 6.9 (4.7) / 7.5 (5.1) (Unseen) Sub-Goals

MohitShridhar commented 4 years ago

Yes, this is expected. We are currently updating the numbers in the paper for the CVPR camera-ready version.