Retrained model does not get the same SPL on val unseen as reported in paper

airsplay / R2R-EnvDrop

PyTorch Code of NAACL 2019 paper "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout"

MIT License

123 stars 25 forks source link

Retrained model does not get the same SPL on val unseen as reported in paper #14

Open HubHop opened 4 years ago

HubHop commented 4 years ago

Hi,

We are trying to retrain the EnvDrop model based on this repo, but the results are not same as reported in paper, we have tried different PyTorch versions, our best result with PyTorch 0.4.1 is 0.46, which is less than the reported 48% on val unseen dataset in terms of SPL, for detailed results you can refer to the attachment below.

Have we missed something important? or can you specify your working environment?

Our retrained model: retrained_envdrop_results.xlsx

Results in paper:

airsplay commented 4 years ago

Sorry for the late reply.

The original code with reproducible code/results is provided in another issue: https://github.com/airsplay/R2R-EnvDrop/issues/11.

Given the results in xlsx, the 2% drop in SPL (46%) is possibly caused by the drop in SR, which is still much higher compared to previous SotA (38%). The reason I currently find is some implementation differences inside the speaker when I cleaned the code (the original, reproducible code is provided in the other issue). Since the beam-search results which only rely on the inference of the speaker also changed. I haven't located which differences cause this issue. All the differences seem not to affect the training/inference process but the predictions are actually changed. Please kindly check the original code before I find it.

Best, Hao

HubHop commented 4 years ago

Thanks for your reply!

xiran2018 commented 4 years ago

please help me ! After I train the model， i use the test environment to evaulate，the success rate result is below， i dont understand why the result is so low？ please help me， is there something wrong when i test ？ image the test script is： name=agent flag="--train validlistener --featdropout 0.3 --angleFeatSize 128 --feedback argmax --mlWeight 0.2 --subout max --dropout 0.5 --optim rms --lr 1e-4 --iters 80000 --submit" CUDA_VISIBLE_DEVICES=$1 python r2r_src/train.py $flag --name $name

HubHop commented 4 years ago

Hi @jingquanliang , I didn't see your result, have you fixed it? Or you can try this script.

name=agent_bt flag="--attn soft --train validlistener --load snap/agent_bt/state_dict/best_val_unseen --angleFeatSize 128 --submit --featdropout 0.4 --subout max --maxAction 35"

CUDA_VISIBLE_DEVICES=$1 python r2r_src/train.py $flag --name $name