Open ziyuwwang opened 4 years ago
Thanks @ziyuwwang for letting me know. It's possible there's some discrepancy somewhere. I'll take a look in next few days. (I've my PhD qualifier talk in coming week).
If you want something quick, there's another replication of our work, that does get similar results. https://github.com/dido1998/Recurrent-Independent-Mechanisms
EDIT 1: I ran the code with 510 hidden units, and 5 RIMs, and 3 active with learning rate 0.001. It gets around "0.88, 0.72, 0.39" in about 20 epochs.
Thank you very much for your help. I will try it according to your instructions. Would you mind providing specific settings for other experiments at your convenience?
Hello @ziyuwwang,
This is the configuration (copied from my code, used in the paper) which gave approximately the same result.
Learning rate: 0.001 600, 6RIMs, top_k = 4, dropout = 0.2.
Here, https://github.com/anirudh9119/RIMs/blob/master/event_based/blocks_core.py#L47 change d_k = 32, d_v = 32 Here, https://github.com/anirudh9119/RIMs/blob/master/event_based/blocks_core.py#L49 change self.att_out = 400.
I ran it, and it did gave same results.
0.90234375, 0.7203525641025641, 0.3892227564102564.
I've not systematically studied what made the difference.
Also, note we did use the above configuration for most of our experiments. Only thing to be careful of
Though, I admit for the paper, we did not use any dropout for bouncing balls, and only recently we figured out that using dropout in the encoder hurt the results.
Hope that helps!
You may also be interested in: https://arxiv.org/abs/2006.16225. This work fixes a big issue in the current work. (exchangeability of different modules).
It's very nice of you to give so many useful suggestions @anirudh9119 ! But still I cannot get a result close to that of the original paper after running the released codes with exact the setting you provide above. I am wandering that:
Thanks for your prompt reply!
It's very nice of you to give so many useful suggestions @anirudh9119 ! But still I cannot get a result close to that of the original paper after running the released codes with exact the setting you provide above
I apologize for wasting your time. It should have been something that works right in the first attempt.
Whether there's some discrepancy between your code and the released code?
I can check again, but it seems there's no discrepancy as of now.
Whether the expected validation accuracy of different resolutions are achieved with the same checkpoint or different checkpoints?
You should see: "Test Optim".
@ziyuwwang It's exactly the same as in the training script. Expected validation accuracy of different resolutions are according to different checkpoints (checkpoint is dependent on validation data for that resolution). We do this both for the proposed method as well as all the baselines (LSTM/RMC etc). The reason was: It's not obvious that best iid accuracy on the training distribution would result in best ood distribution, and hence we do model selection. Indeed we observed that it effected adversely the most for LSTMs. Let me know if this resolves your question. If not, I can see whats the discrepency.
An independent replication also obtains similar result as provided in the paper: https://github.com/dido1998/Recurrent-Independent-Mechanisms
@anirudh9119 I reached a result about "0.84, 0.70, 0.38" of "Test Optim" with the setting of 510 hidden units, and 5 RIMs, and 3 active and learning rate 0.001. I set d_k = 32, d_v = 32, dropout = 0.2.and self.att_out = 400 in blocks_core.py. These changes does improve the performance but I suggest that you should check the codes and attach the right settings. For example, this lines https://github.com/anirudh9119/RIMs/blob/610aa6c80bf72e1bd6228ccfea05026f337b02ed/event_based/blocks_core.py#L78 seems to be useless and redundant. I guess there must be some discrepency.
@ziyuwwang I think I suggested above to use.
600, 6RIMs, top_k = 4.
null_score = iatt.mean((0,1))[1]
Yes this line was used for logging (to see the activation scores). It's not used in making the activation mask.
Hi everyone! I tried 100 epochs using both the original configuration from the rep for sequential-mnist, and @anirudh9119 's suggestion in this thread
just to recap, parameters of _originalconfig are: --cuda --cudnn --algo blocks --name Blocks_MNIST/original_conf --lr .0007 --drop 0.5 --nhid 600 --num_blocks 6 --topk 4 --nlayers 1 --emsize 600 --log-interval 100
parameters of _Anirudhconfig are: --cuda --cudnn --algo blocks --name Blocks_MNIST/anirudh --lr .001 --drop 0.2 --nhid 600 --num_blocks 6 --topk 4 --nlayers 1 --emsize 600 --log-interval 100
In _Anirudhconfig I'm applying both code changes suggested above:
Here, https://github.com/anirudh9119/RIMs/blob/master/event_based/blocks_core.py#L47 change d_k = 32, d_v = 32 Here, https://github.com/anirudh9119/RIMs/blob/master/event_based/blocks_core.py#L49 change self.att_out = 400.
[I also ran _Anirudhconfig, but with the original dropout=0.5, by accident]
parameters of _Anirudhdrop05 are: --cuda --cudnn --algo blocks --name Blocks_MNIST/anirudh_drop05 --lr .001 --drop 0.5 --nhid 600 --num_blocks 6 --topk 4 --nlayers 1 --emsize 600 --log-interval 100
These were the results I got:
_originalconfig: 83.7, 69.3, 45.3
(test values at epoch 16, 12 and 12 respectively [when best valid-values occurred])
_Anirudhconfig: 83.1, 55.6, 35.2
(test values at epochs 16, 4, and 12 respectively [when best valid-values occurred])
_Anirudhdrop05: 82.2, 54.9, 29.5
(test values at epochs 24, 4, and 4 respectively [when best valid-values occurred])
For the 3rd setting (24x24 resolution) I got even better than in the paper, but for the first two (16x16 and 19x19) I couldn't reach the values in the paper, like @ziyuwwang
One final note: It seems like the training is deterministic in my machine (kudos for that!), but maybe something changes from one machine to another?
Thanks @antoniogois. I'll see what's making the difference.
May be a moot point but still, what pytorch version are you using @antoniogois ?
using 1.6.0
@antoniogois @ziyuwwang I'm investigating. I'm not sure, whats going on wrong as of now.
I ran again with the https://drive.google.com/file/d/1KKz3YEyZJ4-2XY40d7akrrrBynG9elVT/view?usp=sharing. I got same results. (It's the official code, I used, and its same as to what here in the repo is). I've uploaded the details of my conda env. I'm investigating what happens by changing the pytorch version. Will keep you updated.
I think you need to provide access to that google drive file :) let me know if I can help with anything regarding this issue, maybe I can try re-running with the same pytorch version as you [after I have access to the google drive]
Sorry. Changed it. the code in zip, and the code here is same. One of my colleagues was also not able to reproduce the results, so I'm looking at it. Thanks for your patience.
@antoniogois @ziyuwwang With everything installed from scratch, I was also able to "reproduce" the issue. (i.e results dont match).
Training logs for the setting where I was able to reproduce the results in the paper are here. https://gist.github.com/anirudh9119/f7d36c9eac054c3d712ed961382750c1
I'm investigating it. Sorry for the problem.
@anirudh9119 I'm sorry but I'm very confused. Do I understand it right, that for the performance in the paper you report results that are obtained from different training epochs? So you are basically testing with differently trained networks? Isn't the whole point, that the (same) model should work irrespective of the input length?
Hi everyone! I'm also experiencing problems in reproducing the results reported in the paper. Are there any update about this question?
Thank you!
Hello, thank you for releasing the implementations of RIMs. I am reproducing your work on the released codes and I just cannot reach the performance reported in RIMs paper on sequential mnist experiment by using following training instructions: "bash experiment_mnist_1layered.sh 600 6 4". I get a much lower accuracy of " 0.78, 0.55, 0.33", compared to "0.90, 0.73, 0.38" in RIMs paper. Could you provide the exact training command or experimental setting? Thanks.