Closed IainDaviesMaths closed 4 years ago
Oh, this looks interesting. @(whoever becomes editor for this), I'd be happy to review.
Thanks for your submission, we'll assign an editor soon (hopefully, because of summer vacation it may take some time). and@cJarvers volunteered to review.
@ThomasA Can you edit this submission ? @cJavers already proposed to review.
@rougier : in case @ThomasA does not react, I can edit this. I am not familiar with Julia, but I am interested in the reproduction of this classic paper :-)
That would be great, thanks.
Sorry, I lost a bit track of ReScience over the summer. I see that an editor has been assigned now.
@cJarvers : thanks for the offer of reviewing this submission. Are you still up for it? @pberkes : would you like to review this?
Guidelines for reviewers are here: https://rescience.github.io/edit/
@otizonaizit : Gladly!
@otizonaizit : Yes, I'd still be happy to review.
The authors successfully reproduce the findings of Hancock’s “PCA of natural images” paper, with minor deviations from the original results that are easily accounted for. The paper is well written and well organized. It’s good work, I don’t have any further comment.
The code is provided in two languages, Matlab and Julia. Unfortunately, I do not have a Matlab license so I could not verify that that part of the code works as intended. As far as I’m concerned, the Matlab code could be removed from the repository as it is redundant and Matlab is not an open language.
The Julia code is readable and decently organized. It runs as intended and produces some version of the figure in the paper. The random seed has not been fixed, so it produces slightly different images for each run.
For instance, this is the image shown in the paper and shipped with the repository:
Running the code I obtain this version 
I don’t know where the editors stand on this issue, but personally I think that the random seed should be a global parameter, and running the code with the same setting would always return exactly the same images.
Regarding the code repository: while I do appreciate the effort of the authors in making their research reproducible, the GitHub repository could be organized better.
setup.jl
first and then make run-julia
Thank you for reproducing this classic!
@otizonaizit : Yes, I'd still be happy to review.
@cJarvers: do yo uhave a timeline for your review?
@IainDaviesMaths :
@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest. I have a Matlab license, so I should be able to run & review that part of the code as well.
@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest. I have a Matlab license, so I should be able to run & review that part of the code as well.
Perfect, thanks! I just wanted to have a rough timeline, please take your time :-)
Here is my review, @otizonaizit & @IainDaviesMaths
This review based on commit https://github.com/IainDaviesMaths/Reproduction-Hancock/commit/519b1695eb05a1d328b033728025c5e90f1d40c5
The authors successfully replicated the results of the original article. There are some minor deviations, but these are within the range of what should be expected, given that the additional natural images and the text images were not available anymore. The authors discuss these differences sufficiently.
The only place where I feel that more detailed commentary would be helpful is the reproduction of the text experiment. The authors state that "[f]or this to be reproduced, further experimental modifications had to be made to the text image", but what exactly these modifications were remains unclear. A suitable font had to be chosen (which one?) - anything else? Since the authors state that these results seem to be brittle, it might be nice to also include the result for a different font.
The authors include a makefile, which makes rerunning their code easy. Since the code is provided in both Matlab and Julia, I'll briefly comment on each below.
Julia:
I ran the code with Julia 1.5.1.
As @pberkes already noted, the Julia code runs without problems, but does not produce perfectly reproducible results. I agree that the most crucial fix would be to set the random seeds.
Another potential problem for reproducibility is package management. The authors provide a setup.jl
script that installs the required dependencies. However, the versions that get installed are not necessarily the same ones as the authors used, so the behavior of the code might change in the future. However, the repo already contains the solution to this problem: the Manifest.toml
and Project.toml
files document the exact package versions that were used. Instead of letting the user run the setup.jl
script, the authors could add the following lines to the beginning of their runall.jl
script:
using Pkg
Pkg.activate(".")
Pkg.instantiate()
In that case it might also be good to remove the setup.jl
script to avoid confusion.
Matlab:
I ran the code with Matlab version R2019b.
Similar to the Julia code, the Matlab version runs without problems and generates Figures that resemble those in the paper, but have subtle differences (for example, the sign of he principal components is flipped) since the random seeds are not fixed. As with the Julia code, this should be rectified by setting random seeds.
Running make all
generates figures 2 to 8 with Matlab, leaving out figure 1. Since figure 1 only concatenates the input images, this is not a big problem, but for completeness sake it might be better to generate this figure as well (just like the Julia code).
Since Matlab is not an open language, I also tried running the code with Octave 5.2.0. This did not throw any errors and the results for figure 1 were correct, but the other figures ran for a long time. I gave up after a few hours. I know it should be possible to make it run faster in Octave using the JIT compiler or something, but I don't have enough experience with that to go down that road.
As far as I understand it, the policy of ReScience is that the code has to be in an open source language, so not Matlab. This replication is an interesting edge case, since there is also an open source implementation (in Julia). The question is whether both should be published. I guess this is for the editorial team to decide.
I agree with @pberkes that the structure of the repository should be cleaned up. There should be separate subfolders for the images, the generated figures, the Julia code and the Matlab code.
In general, the code is clear and easy to read. There are only a few improvements I would suggest. All of these are suggestions, not requirements. If the authors disagree with me, I would nevertheless recommend publication.
Matlab:
In the Matlab code, there are some code duplications. For example, the functions LearningProcess
and LearningProcessRotated
are nearly identical, except for the rotation in the latter. One could just make rotation a parameter of the function and pass 0
if the images should not be rotated. Another example occurs in PlotFigure7.m
and PlotFigure8.m
. Here, the calculation of the mean gray level and the learning process are included in the script, even though the code is basically identical to the code in the functions MeanGrayLevel
and LearningProcess
. It would be better to just replace the code blocks by calls to these functions.
In general, I think the Matlab code would be a bit more readable if spaces were added around operators.
Julia:
Since the Julia code is very close to the Matlab code, the same improvements (reduction of code duplications, more spaces) could be made here. One additional recommendation concerns the documentation. Currently, each function is documented with an initial comment, Matlab-style. The convention in Julia is to use docstrings (see here). This has the advantage that Julias inbuilt help features (e.g., at the REPL) can be used. So simply cutting the text of each documentation comment and pasting it as a string atop the function would be an improvement. Again, this is only a recommendation.
The article is excellent. It is easy to understand what was done, which parts of the replication were difficult etc.
While reading the paper and code, I found a few small things (typos etc.) that could be improved:
sagerupdate.m
should probably be called sangerupdate.m
Sager
should be Sanger
PlotFigure6.m
the comment in line 36 says "Double scale", but should probably be "Half scale"hancock.jl
line 469hancock.jl
has been commented out and could be removed to increase clarityThe replication was successful. The article is excellent. The code is also well structured and easily readable, however there are two problems with reproducibility (lack of random seeds; suboptimal package management) that the authors should fix before publication. In addition, the repo should be cleaned up.
Since one of the two implementation is in Matlab, the editorial team needs to decide whether that is appropriate for publication in ReScience, or whether only the Julia code should be published.
Finally, there are some improvements to the code that I would recommend (reducing duplications, adding spaces around operators, adding docstrings), but these are only suggestions.
Thank you @cJarvers for your review!
@IainDaviesMaths : both reviewers agree that the paper is fit for publication. You just need to address the reviewers comments, especially regarding the random seed and the re-organization of the code. Do you have an estimate when this can happen?
@rougier , @khinsen , @benoit-girard , @oliviaguest : in this submission we have a reproduction with code both in Julia and Matlab. The code in both languages has been reviewed, even if the Matlab code had some difficulties running under Octave. We have had this discussion already, but I think an official statement by the Editors in Chief would be good: do we accept code submissions in Matlab? Given that in this particular case we have corresponding code in Julia, it is maybe worth to relax the submission policy? Should the authors remove the Matlab code from the submission?
@otizonaizit seems like a discussion that deserves it's own thread IMHO!
In this specific case, I think we can relax the recommendation and include the Matlab code as well, expecially if it works under Octave. But I agree with @oliviaguest that we should open a thread to decide on a proper guideline.
Thank you everyone for the reviews and feedback on the paper! Myself and Stephen are in the process of making all the suggested edits and should be done in the next few days.
Most edits have been done now - just a few minor Julia edits to be made which Stephen is on top of. Will alert when the article is in final draft.
Hi all - I think that's all your suggestions added to the code and article now! Let me know if there's anything else to be done.
@otizonaizit 🛎️
@pberkes @cJarvers : I find the reorganized repo and the other changes quite good, so if there are no objections from you two I'd accept the paper and contact @IainDaviesMaths fo the details of the publication process! Please write a comment here in any case.
@otizonaizit no objections from me!
@otizonaizit no objections from me either. The changes address all the issues I raised. Great work by the authors!
Great. Congratulations @IainDaviesMaths : the paper is accepted! I will follow up shortly with a PR to the paper repo for the update of metadata.
@otizonaizit thanks for the info about proceeding, I think it is ready for publication now!
Great! Published:
The article will appear on https://rescience.github.io/read/ as soon as these two PRs are merged by one of the editors in chief (@rougier ?)
Super, thanks! Well done Iain for getting this great work published.
Should be online now.
Original article: The Principal Components of Natural Images by Hancock et al (1991), http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=044DFF560FC2E801B579C2F23D268B44?doi=10.1.1.41.192&rep=rep1&type=pdf*
PDF URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/article.pdf Metadata URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/metadata.yaml Code URL: https://github.com/IainDaviesMaths/Reproduction-Hancock
Scientific domain: Computational Neuroscience Programming language: Julia Suggested editor: -