From which weight can I reproduce experimental results in the paper among three weights?

bhkim94 commented 4 years ago

First of all, thanks for sharing the great code.

Which weight should I use for validation and test dataset? I'm trying to reproduce numbers in the paper. The model is trained with alfred/models/README.md. After training, I've encountered three weights,
- best_seen.pth
- best_unseen.pth
- latest.pth
Which max_step value should I use? The paper said that max_step is set to 400, but README said it is set to 1000. I'm not sure which one is for paper.

MohitShridhar commented 4 years ago

best_seen.pth gives the best performance. We will include a note for this. Thanks!
We will update the paper in the next revision. Thanks!

bhkim94 commented 4 years ago

Huge thanks for replying!

I ran your codes for both max_step (400, 1000) with default settings in the validation dataset, and I got higher numbers than ones reported in the paper.

Is there any difference between experimental settings in this code and the paper?

Numbers below are from max_step == 400, and I got similar results with max_step == 1000.

(numbers reported in the paper / numbers from the codes)

(Seen) SR (PLW S): 3.8 (2.2) / 4.0 (2.1) (Seen) PC (PLW PC): 10.9 (6.9) / 10.5 (7.2) (Seen) Sub-Goals

Goto: 47.8 / 61.3
Pickup: 35 / 44.9
Put: 79.8 / 81.2
Cool: 87.1 / 90.2
Heat: 84.9 / 88.8
Clean: 83.8 / 83.0
Slice: 31.6 / 45.0
Toggle: 100 / 96.8

(Unseen) SR (PLW S): 0.1 (0.0) / 0.2 (0.1) (Unseen) PC (PLW PC): 6.9 (4.7) / 7.5 (5.1) (Unseen) Sub-Goals

Goto: 20.3 / 37.0
Pickup: 23.6 / 32.4
Put: 43.4 / 55.7
Cool: 95.2 / 95.4
Heat: 89.6 / 91.1
Clean:56 / 33.6
Slice: 25.6 / 44.8
Toggle: 57.9 / 53.5

MohitShridhar commented 4 years ago

Yes, this is expected. We are currently updating the numbers in the paper for the CVPR camera-ready version.

askforalfred / alfred

From which weight can I reproduce experimental results in the paper among three weights? #15