[Re] On the Relationship between Self-Attention and Convolutional Layers

NishantPrabhu commented 3 years ago

Original article: On the Relationship between Self-Attention and Convolutional Layers by Jean-Baptiste Cordonnier, Andreas Loukas & Martin Jaggi, in proceedings of ICML 2020.

PDF URL: Reproducibility report Metadata URL: Metadata Code URL: Code

Scientific domain: Deep Learning
Programming language: Python Suggested editor: Nicolas P. Rougier (@rougier)

Dear all, This is a reproduction of the paper mentioned above, written with Mukund Varma (@mukundvarmat). The paper discusses the use of transformers and self-attention for vision tasks. We suggest Nicolas P. Rougier as the editor.

With warm regards, Nishant Prabhu

koustuvsinha commented 3 years ago

FYI this paper was submitted to ML Reproducibility Challenge, and the reviews can be viewed here: https://openreview.net/forum?id=4hm5ufX69jo . While the report was not accepted for the challenge, the authors probably wish their report to be considered for an independent evaluation for this journal. I hope the reviewers can take the existing reviews and the updated manuscript into consideration for the review. @rougier, since I was the Program Chair, reviewing this paper is a conflict of interest for me, and I pre-emptively recuse myself for the same!

rougier commented 3 years ago

@koustuvsinha Thanks for the explanation. I think this is the first time we have such case and I'm not sure how we should handle it. @khinsen @oliviaguest @benoit-girard Thought ?

@NishantPrabhu I think we can review your submission as a regular ReScience submission but you would need to adapt your manuscript. Since you already had some reviews, it might be also good to take these advices into account. But before that, please wait until we make a decision and have an editor.

NishantPrabhu commented 3 years ago

@rougier

Greetings!

We intend our submission to be evaluated as a regular ReScience submission. The current submission has already been redesigned according to the template prescribed on the repository. If any other form of adaptation/modification is desired, please let us know and we will get it done as soon as possible.
We had gone through and responded to the reviews given during MLRC 2020, and our submission had been revised to concur with the suggestions in the best possible manner. Our submission here contains the revised content.

Please let us know if there's anything else we should attend to. Thanks!

khinsen commented 3 years ago

@rougier I'd say we should treat this as a regular ReScience submission, and let the editor who will handle the review decide if and how the prior reviews (which correspond to an earlier version) can be integrated into the reviewing process.

oliviaguest commented 3 years ago

I am happy to take this over as editor.

oliviaguest commented 3 years ago

I also recommend @\nicksexton and @\don-tpanic as reviewers for this (when such time comes). ☺️

rougier commented 3 years ago

@oliviaguest Ok. Next question is whether we need one or two reviewers given that the article has been already reviewed.

oliviaguest commented 3 years ago

I vote for 2 — not only because I can get 2 easily, but it also seems fair.

oliviaguest commented 3 years ago

So @rougier 2 is OK? Shall we all discuss this?

rougier commented 3 years ago

Sorry for the delay and 2 is fine for me.

oliviaguest commented 3 years ago

👋 @nicksexton and @don-tpanic, would either or (even better) both of you be able to review this for me? ☺️

nicksexton commented 3 years ago

hello, we are on the case with this. Will keep you all posted. Is there a deadline?

oliviaguest commented 3 years ago

@nicksexton perfect, thank you both! I'll give @don-tpanic a chance to also reply, so I can "assign" you both (GitHub thing that allows you to get notifications, etc). Deadline can be flexible, but if you can do this over the next few weeks, that would be amazing. Let me know your own schedules and we can work around that, but a month or so is totally fine. 😊

don-tpanic commented 3 years ago

👋 @nicksexton and @don-tpanic, would either or (even better) both of you be able to review this for me? ☺️

Yep, I am on board too. Oh hi Olivia :)

don-tpanic commented 3 years ago

Hi @NishantPrabhu

I ran into this error when running python3 main.py --config 'configs/learned_2d.yaml' --task 'train' inside main/

Traceback (most recent call last):
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 414, in <module>
    trainer = Trainer(args)
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 28, in __init__
    self.encoder = networks.Encoder(self.config['encoder']).to(self.device)
AttributeError: module 'models.networks' has no attribute 'Encoder'

I suspect this is something minor like the attribute was renamed? Could you check this for me and update the code?

Thanks! Ken

NishantPrabhu commented 3 years ago

Hi @don-tpanic

Thanks for pointing out this issue. As you rightly diagnosed, a class name had been modified. I've fixed it now and it's working fine at my end. Let me know if you face any other issues.

Thanks and with warm regards, Nishant.

NishantPrabhu commented 3 years ago

Hi again @don-tpanic

If you're planning to train the model for a while, you might want to modify the batch size in the config files to suit your accelerator's capacity. I've set it to 1 for testing the changes I recently made.

Thanks, Nishant.

don-tpanic commented 3 years ago

Thanks @NishantPrabhu for your prompt response!

Could you look into this error for me?

When I ran python3 main.py --config 'configs/resnet.yaml' --task 'train'

Traceback (most recent call last):
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 609, in <module>
    trainer = Trainer(args)
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 38, in __init__
    self.encoder = networks.Encoder(self.config['encoder']).to(self.device)
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/models/networks.py", line 109, in __init__
    self.num_blocks = config['num_encoder_blocks']
KeyError: 'num_encoder_blocks'

Thanks, Ken

NishantPrabhu commented 3 years ago

Hi @don-tpanic

The CLI command to train a ResNet model is slightly different. I understand the confusion since we haven't mentioned this detail in the README, which I have added now. Please check the Task options section in the README to know more.

In particular, to train a ResNet model, the correct command would be:

python3 main.py --config 'configs/resnet.yaml' --task 'resnet_train'

Note that --task is given resnet_train and not train. It is so because our trainer definitions for the two are quite different.

Let me know in case of other issues. Thanks!

Nishant.

don-tpanic commented 3 years ago

Hi @NishantPrabhu,

Thanks for updating the README. I was able to run training with resnet perfectly!

Here is another error I came across when training quadratic embedding with isotropic gaussian. python3 main.py --config 'configs/gaussian.yaml' --task 'train'

The error was thrown at the end of validation:

[TRAIN] Epoch [  1/300]
Progress: [===================>] 100.00% [Loss] 1.8334 [Accuracy] 0.3218 
[VALID] Epoch [  1/300]
Progress: [===================>] 100.00% [Loss] 1.5961 [Accuracy] 0.4208 Traceback (most recent call last):
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 610, in <module>
    trainer.train()
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 438, in train
    self.visualize_attention(epoch)
  File "/home/ken/projects/Self-Attention-and-Convolutions/main/main.py", line 197, in visualize_attention
    ax.imshow(img_normal.permute(1, 2, 0).cpu().numpy())
RuntimeError: number of dims don't match in permute

I believe the error was originated in visualize_attention. One related question if you could clarify it for me: it seems attention will always be visualized no matter if the task is train or viz? I wonder if this is intended behaviour or could be sligtly improved?

Thanks, Ken

NishantPrabhu commented 3 years ago

Hi @don-tpanic

First of all, I'm really sorry about all these roadblocks you're facing. Looks like we didn't handle some of our code correctly after we modified it post our RC2020 reviews.

Anyway, I have updated the code to improve consistency across experiments. When you use set --task as viz, the code will now look for imgs/ in the working directory and use one image from it for visualization. A folder with that name is already present in the experiment directories, with some handpicked images in it. Please note that for SAN and ViT, loading a pretrained model using --load is necessary for visualization to work. For main, it will work but will generate useless results.

Also, I've disabled visualization during training. We had done that to observe the evolution of attention patterns with training, which should not be necessary now. If needed in the future, just uncomment line 433 in main/main.py. This part of the code for all three experiments is now working fine on my end.

Thanks and with warm regards, Nishant.

NishantPrabhu commented 3 years ago

Greetings!

I hope you're doing fine. I just wanted to know if you require any further clarification or assistance regarding our submission. It would be great if we could have a brief update about how far the review process has come.

Thanks and with warm regards, Nishant.

oliviaguest commented 3 years ago

Hi @nicksexton and @don-tpanic — have either of you (or both) had a chance to keep going on this? Any updates welcome, and a rough (and flexible) ETA would be useful! ☺️

don-tpanic commented 3 years ago

Hi @NishantPrabhu @oliviaguest, sorry for being a bit slow getting back to this as we got held up with trying to get a paper out. We have reproduced the results of 4 models so far and are expecting more to come. @nicksexton and I will touch base on this tomorrow and we will update you all with a timeline.

Thanks, Ken

oliviaguest commented 3 years ago

@don-tpanic amazing! 🌸

nicksexton commented 3 years ago

Thanks for your patience, we have had a chance to confer on this. We have worked through running the models in table 1 and generally, everything checks out.

For table 1, we have an issue with the final two models (Learned embeddings w/ and w/o content) - we find this exceeds the memory on our GPUs (RTX 2080). We reduced the batch size accordingly (from 100 to 32) however with this setting we cannot get the model to train. We are unsure of the best way to proceed on this, @oliviaguest this seems to be an interesting reproducibility question, wonder if you could advise. It would be helpful to have more detail on the computational setup used? We're unsure whether we actually have any GPUs in lab big enough to run this model with the specified batch size without falling back on e.g. AWS.
Regarding (e.g.) figure 1, is there a script which automates parameterising, training and warehousing data from all the models in order produce the figure? Currently this seems to require individual runs of the model with specific parameter settings (the various models do not 100% match the naming of the configs used to run the experiments. what are w/ and w/o content in the experiment config? is it the learned_2d.ymal - use_attention_data?), we have been able to make educated guesses about these, but for reproducibility this needs to be explicitly stated (ideally in a script, or in the readme).
The part of the data processing pipeline that encompasses retrieving stored simulation results and generating figures relies on an undocumented jupyter notebook (./temp/log_extract.ipynb). Not that there is anything wrong with notebooks per se but we are specifically concerned about the need to manually configure model runs and name datasets (2, above) where specific filenames are hard-coded in the plotting notebook. As a minimum, we think there is a need for documented code that automates this part of the process.
Currently, as far as table 1, all our reproductions check out and we are reasonably confident that any given result in the paper will generate the results stated. However, although sufficient information is provided that we can run any given model, lack of high-level automated scripts makes reproducing a given table or figure quite a manual process that is a) potentially subject to user error, and b) consuming significant amounts of time on our end. We wonder if there can be a discussion of what is reasonable here @oliviaguest. Please could you advise if we should be checking all models reported in the paper or if spot checks are sufficient? If we are aiming for reproducing all results in the paper, can we ask ideally for simulation scripts so that running each figure or table is a one-liner.

Thanks! Nick

oliviaguest commented 3 years ago

Thanks for all this, you @nicksexton and @don-tpanic! ☺️

Two main points/takeaways from me:

a. @NishantPrabhu, you now probably have a lot of small and large changes to make to improve this replication. Feel free to deal with these while everybody else discusses b (below), if you want.

b. I am not sure about the questions above (in points 1 and 4). @khinsen @benoit-girard @rougier can you chime in too, please? I would like the author(s; @NishantPrabhu) to also give their perspective on this, when possible.

MukundVarmaT commented 3 years ago

Hi @nicksexton Thank you for reviewing our reproducibility report.

We would like to direct you to section 3.2 in the submitted reproducibility report which mentions the required computational resources to replicate all the results. To summarize, the learned embedding-based experiments were run on 16GB NVIDIA V100 GPUs rented from AWS, and the others were run on an 8GB NVIDIA RTX 2060 owned by us.
We apologize for our oversight in not including individual config files for each training experiment and we shall include them as soon as possible for convenience. This should also improve consistency in naming schemes between the paper and the code.
The provided code does include methods/functions to visualize the attention plots. With regards to fig. 1 in the reproducibility report (performance vs epochs across various models), we have used a notebook to extract the required information from the corresponding log files. We would also like to point out that wandb (which is used for logging our experiments) does allow for comparison across various training runs which should produce the same figure. Therefore we have not included a separate script for the same.

Thoughts about 1. and 4. Thank you for validating and confirming our results in Table 1. As mentioned in 2., we will include separate configuration files for each run which should help automate reproducing all the results. Indeed some of these experiments are quite computationally heavy making it quite difficult to validate all our results. If it helps, we could provide our training logs from wandb as a check for our claimed results (which could be an alternative from having to run everything from your end).

PS: As mentioned in our report, we go beyond reproducibility and propose a new variant of the attention operation (to possibly overcome some of the problems observed during reproducibility). Since the discussion (so far) didn't mention it, we wanted to highlight it to the reviewers and would love to hear their thoughts regarding the same.

Thanks and Regards, Mukund and Nishant

NishantPrabhu commented 3 years ago

Greetings!

I have added experiment specific configuration files to our repo and updated the README with more information about them. Please note that I have only included configurations of experiments we performed, even though more combinations are possible. Kindly let me know if anything else needs to be done.

Thanks and with warm regards, Nishant.

rougier commented 3 years ago

@oliviaguest Jumping in. Can you give us a bit of context?

oliviaguest commented 3 years ago

@rougier from above "To summarize, the learned embedding-based experiments were run on 16GB NVIDIA V100 GPUs rented from AWS, and the others were run on an 8GB NVIDIA RTX 2060 owned by us." (As said by @MukundVarmaT.)

I assume this means the results can only be recreated using specific (types of) hardware. What do we think — can this be overlooked or do the authors have to do something to make it "more" replicable? I assume no, but I also am not sure what this means for the rules/standards of the journal?

rougier commented 3 years ago

cc: @khinsen @benoit-girard

Oh I see. I think we talked about such case a long time ago without finding a definitive solution. Fro the present case, I think we need to trust the author of the replication (@NishantPrabhu) that his results are correct and the reviewers can only review article and read code to check it is consistent. But in the long term, we need to address the problem in a generic way. Not sure how though.

oliviaguest commented 3 years ago

If so, then all we need to be done here is for @nicksexton and @don-tpanic to sign off on the changes once the authors confirm they are done making them — OK, good. ☺️

don-tpanic commented 3 years ago

Hi @NishantPrabhu,

I have managed to reproduce more results from the paper and things are looking really good so far. Here are some minor suggestions which I think could improve the ease of reproducibility for others in the future.

For the moment, in order to visualise attention over images, one has to trace back into the source code for where to supply the image file name. It would be nice if this bit is less hard-coded.
Again, minor suggestion but I think the command provided for visualising attention could be improved. The current command could potentially cause 2 problems when no --output is provided by the user. i) as it defaults to a directory named based on the current timestamp, it might lead to confusion in that model checkpoints and associated results are located in different directories (took me a while to find the nice attention plots); ii) when using this command for visualising gaussians in Figure 2 (with --task 'gauss_viz'), leaving --output to default will actually be a problem because the code looks into this new directory based on timestamp that does not have any checkpoints.

Basically, I suggest either making this more explicit that both --output and --load need provided and consistent or it would be better when providing --load is sufficient.

In our previous feedback, Nick mentioned that currently the code for plotting Figure 1 is a jupyter notebook. As he suggested, perhaps it would be better to have a standard script for this part like everything else. There doesn't seem to be enough information for using this notebook either in the README. Where this notebook is located can also be a bit hard to find as the directory is named temp.

Best, Ken

NishantPrabhu commented 3 years ago

Hi @don-tpanic,

Thanks for the suggestions!

I have added another parameter --image to the list of CLI arguments where you can specify the path to an image on which visualization operations will be performed. I've added information about this option to the README as well.
I've modified the code so that whenever a directory is specified using --load, the output directory will be overwritten with this argument. Thus, all visualization outputs will be stored in the same directory containing the trained model. I hope this is what was expected.
I'll need a little more time to develop a general script for this. I will add it to the repo as soon as I can. Thanks for your patience!

I have tested the code after modifications (1) and (2), and it's working fine on my end. If you face any issues running the code on your end, please let me know.

Please also note the following changes:

Not specifying --load for visualization tasks in any of the experiments will now throw a NotImplementedError. Earlier, it would generate outputs with a randomly initialized model. I found this more sensible as it will notify the user of this requirement instead of generating wrong results.

Thanks and with warm regards, Nishant Prabhu

NishantPrabhu commented 3 years ago

Hi everyone,

I've added a script plot_performance.py to plot the train/validation metrics of desired experiments. I've added instructions to use the script in the README. Please let me know if it fulfills the required functionality, or if any issues are faced.

Thanks and with warm regards, Nishant Prabhu

oliviaguest commented 3 years ago

@NishantPrabhu thank you! I will wait for the reviewers to confirm this is all OK now! Very exciting, as I think we're very close. 😊

don-tpanic commented 3 years ago

Hi @NishantPrabhu,

Thanks for all the prompt updates. Here is another issue when I ran python3 main.py --config 'configs/vit_hier.yaml' --task 'train'

  File "/home/ken/projects/Self-Attention-and-Convolutions/ViT/main.py", line 346, in <module>
    trainer = Trainer(args)
  File "/home/ken/projects/Self-Attention-and-Convolutions/ViT/main.py", line 82, in __init__
    if os.path.exists(os.path.join(args['load'], 'best_model.ckpt')):
  File "/home/ken/.conda/envs/rescience/lib/python3.9/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Can you look into it and make sure SAN/ doesn't have the same issue?

Thanks, Ken

NishantPrabhu commented 3 years ago

Greetings!

Sorry about this bug; has been fixed now. Tested okay for SAN/ and ViT/ on my end.

Thanks, Nishant

NishantPrabhu commented 3 years ago

Hi all,

Accidentally closed the issue; have reopened it. Sorry if this causes any issues!

Thanks, Nishant.

khinsen commented 3 years ago

@NishantPrabhu Jumping in to ask for your opinion on a proposal for improving the ReScience C reviewing workflow for future HPC-style submissions. Unrelated to this review, I am not asking you to adopt this scheme retroactively! From your point of view, would such a process be a reasonable burden on authors of future submissions?

NishantPrabhu commented 3 years ago

@khinsen First of all, sorry for the delay.

Most of the suggestions and issues put forward by the reviewers were genuine and necessary for our reproducibility effort to be complete. There is one suggestion I would like to make anyway.

More often than not, authors maintain detailed logs of experiments they run (on platforms like Tensorboard or WandB). The same can be made available for the reviewers to verify the results, removing the need for the reviewers to redo them on their end. On a case-by-case basis, some experiments can be looked at in more detail if the logs are not convincing enough. No extra effort needs to be put in by the authors as well as the reviewers. Apart from this, the current reviewing workflow seems very reasonable.

Thanks, Nishant.

NishantPrabhu commented 3 years ago

Also, if I may ask, is there anything else we can help you with to complete the review? Would be great if an estimate of when the process will complete can be provided.

Thanks for your time. Nishant.

oliviaguest commented 3 years ago

Given the reviewers are donating their time in a pandemic, I do not want to pressure them especially as I think they have done a fantastic job so far. When @don-tpanic and @nicksexton are happy to take this to acceptance, we can continue. What is left anyway?

nicksexton commented 3 years ago

Hi everyone, thanks for your patience.

@don-tpanic and I have finished reviewing all the simulations reported in the paper and we are happy to recommend it for publication. We really like the paper and feel it reports a replication clearly as well as providing novel insights. In general, everything we did in replicating it checked out - the code was clearly documented, mostly ran straight out of the box, and the authors have been very responsive in addressing the exceptions. All the simulations we have run match the results in the paper, except the simulations originally run on AWS that we didn't have the hardware for in the lab, as discussed above all the other results match the paper so we have high confidence that these also would replicate.

Nice work and congratulations!

MukundVarmaT commented 3 years ago

We would like to thank the reviewers (@don-tpanic, @nicksexton) and the editor (@oliviaguest) for taking out the time and reviewing our submission. Your feedback has definitely helped improve our reproducibility report.

@oliviaguest, What would be the steps from here, and is there something required from our end to complete the process?

oliviaguest commented 2 years ago

Sure, @MukundVarmaT you can make sure your metadata.yaml file is fully filled in with what is required of you. I need it for the publication script: https://github.com/rescience/articles#15-steps-to-publishing-a-rescience-article. 😊

MukundVarmaT commented 2 years ago

@oliviaguest Please find updated metadata.tex, article.pdf, metadata.yaml

I believe some of the information has to be filled in by you which will also reflect changes onto article.pdf.

oliviaguest commented 2 years ago

@MukundVarmaT can you check if it's easy for you to fix the text and figure 1 going into the margins on page 4, please?

oliviaguest commented 2 years ago

@MukundVarmaT I might be wrong but maybe all the figures are going into the margins... is this on purpose and I just never noticed it before, @rougier?

ReScience / submissions

[Re] On the Relationship between Self-Attention and Convolutional Layers #53