ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

[Re] The principal components of natural images #45

Closed IainDaviesMaths closed 4 years ago

IainDaviesMaths commented 4 years ago

Original article: The Principal Components of Natural Images by Hancock et al (1991), http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=044DFF560FC2E801B579C2F23D268B44?doi=10.1.1.41.192&rep=rep1&type=pdf*

PDF URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/article.pdf Metadata URL: https://github.com/IainDaviesMaths/Hancock-Rescience-Paper/blob/master/metadata.yaml Code URL: https://github.com/IainDaviesMaths/Reproduction-Hancock

Scientific domain: Computational Neuroscience Programming language: Julia Suggested editor: -

cJarvers commented 4 years ago

Oh, this looks interesting. @(whoever becomes editor for this), I'd be happy to review.

rougier commented 4 years ago

Thanks for your submission, we'll assign an editor soon (hopefully, because of summer vacation it may take some time). and@cJarvers volunteered to review.

rougier commented 4 years ago

@ThomasA Can you edit this submission ? @cJavers already proposed to review.

otizonaizit commented 4 years ago

@rougier : in case @ThomasA does not react, I can edit this. I am not familiar with Julia, but I am interested in the reproduction of this classic paper :-)

rougier commented 4 years ago

That would be great, thanks.

ThomasA commented 4 years ago

Sorry, I lost a bit track of ReScience over the summer. I see that an editor has been assigned now.

otizonaizit commented 4 years ago

@cJarvers : thanks for the offer of reviewing this submission. Are you still up for it? @pberkes : would you like to review this?

otizonaizit commented 4 years ago

Guidelines for reviewers are here: https://rescience.github.io/edit/

pberkes commented 4 years ago

@otizonaizit : Gladly!

cJarvers commented 4 years ago

@otizonaizit : Yes, I'd still be happy to review.

pberkes commented 4 years ago

The authors successfully reproduce the findings of Hancock’s “PCA of natural images” paper, with minor deviations from the original results that are easily accounted for. The paper is well written and well organized. It’s good work, I don’t have any further comment.

The code is provided in two languages, Matlab and Julia. Unfortunately, I do not have a Matlab license so I could not verify that that part of the code works as intended. As far as I’m concerned, the Matlab code could be removed from the repository as it is redundant and Matlab is not an open language.

The Julia code is readable and decently organized. It runs as intended and produces some version of the figure in the paper. The random seed has not been fixed, so it produces slightly different images for each run.

For instance, this is the image shown in the paper and shipped with the repository: julia-fig6 copy

Running the code I obtain this version julia-fig6

I don’t know where the editors stand on this issue, but personally I think that the random seed should be a global parameter, and running the code with the same setting would always return exactly the same images.

Regarding the code repository: while I do appreciate the effort of the authors in making their research reproducible, the GitHub repository could be organized better.

Thank you for reproducing this classic!

otizonaizit commented 4 years ago

@otizonaizit : Yes, I'd still be happy to review.

@cJarvers: do yo uhave a timeline for your review?

otizonaizit commented 4 years ago

@IainDaviesMaths :

cJarvers commented 4 years ago

@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest. I have a Matlab license, so I should be able to run & review that part of the code as well.

otizonaizit commented 4 years ago

@otizonaizit I'll try to do it within the next few days. Since I have a 6-month-old at home, it's a bit hard to predict how much time I can actually spend on work each day, but I'll do it by end of next week the latest. I have a Matlab license, so I should be able to run & review that part of the code as well.

Perfect, thanks! I just wanted to have a rough timeline, please take your time :-)

cJarvers commented 4 years ago

Review of Davies & Eglen: [Re] The principal components of natural images

Here is my review, @otizonaizit & @IainDaviesMaths

This review based on commit https://github.com/IainDaviesMaths/Reproduction-Hancock/commit/519b1695eb05a1d328b033728025c5e90f1d40c5

Success of the replication

The authors successfully replicated the results of the original article. There are some minor deviations, but these are within the range of what should be expected, given that the additional natural images and the text images were not available anymore. The authors discuss these differences sufficiently.

The only place where I feel that more detailed commentary would be helpful is the reproduction of the text experiment. The authors state that "[f]or this to be reproduced, further experimental modifications had to be made to the text image", but what exactly these modifications were remains unclear. A suitable font had to be chosen (which one?) - anything else? Since the authors state that these results seem to be brittle, it might be nice to also include the result for a different font.

Reproducibility

The authors include a makefile, which makes rerunning their code easy. Since the code is provided in both Matlab and Julia, I'll briefly comment on each below.

Julia:

I ran the code with Julia 1.5.1.

As @pberkes already noted, the Julia code runs without problems, but does not produce perfectly reproducible results. I agree that the most crucial fix would be to set the random seeds.

Another potential problem for reproducibility is package management. The authors provide a setup.jl script that installs the required dependencies. However, the versions that get installed are not necessarily the same ones as the authors used, so the behavior of the code might change in the future. However, the repo already contains the solution to this problem: the Manifest.toml and Project.toml files document the exact package versions that were used. Instead of letting the user run the setup.jl script, the authors could add the following lines to the beginning of their runall.jl script:

using Pkg
Pkg.activate(".")
Pkg.instantiate()

In that case it might also be good to remove the setup.jl script to avoid confusion.

Matlab:

I ran the code with Matlab version R2019b.

Similar to the Julia code, the Matlab version runs without problems and generates Figures that resemble those in the paper, but have subtle differences (for example, the sign of he principal components is flipped) since the random seeds are not fixed. As with the Julia code, this should be rectified by setting random seeds.

Running make all generates figures 2 to 8 with Matlab, leaving out figure 1. Since figure 1 only concatenates the input images, this is not a big problem, but for completeness sake it might be better to generate this figure as well (just like the Julia code).

Since Matlab is not an open language, I also tried running the code with Octave 5.2.0. This did not throw any errors and the results for figure 1 were correct, but the other figures ran for a long time. I gave up after a few hours. I know it should be possible to make it run faster in Octave using the JIT compiler or something, but I don't have enough experience with that to go down that road.

As far as I understand it, the policy of ReScience is that the code has to be in an open source language, so not Matlab. This replication is an interesting edge case, since there is also an open source implementation (in Julia). The question is whether both should be published. I guess this is for the editorial team to decide.

Clarity of the code

I agree with @pberkes that the structure of the repository should be cleaned up. There should be separate subfolders for the images, the generated figures, the Julia code and the Matlab code.

In general, the code is clear and easy to read. There are only a few improvements I would suggest. All of these are suggestions, not requirements. If the authors disagree with me, I would nevertheless recommend publication.

Matlab:

In the Matlab code, there are some code duplications. For example, the functions LearningProcess and LearningProcessRotated are nearly identical, except for the rotation in the latter. One could just make rotation a parameter of the function and pass 0 if the images should not be rotated. Another example occurs in PlotFigure7.m and PlotFigure8.m. Here, the calculation of the mean gray level and the learning process are included in the script, even though the code is basically identical to the code in the functions MeanGrayLevel and LearningProcess. It would be better to just replace the code blocks by calls to these functions.

In general, I think the Matlab code would be a bit more readable if spaces were added around operators.

Julia:

Since the Julia code is very close to the Matlab code, the same improvements (reduction of code duplications, more spaces) could be made here. One additional recommendation concerns the documentation. Currently, each function is documented with an initial comment, Matlab-style. The convention in Julia is to use docstrings (see here). This has the advantage that Julias inbuilt help features (e.g., at the REPL) can be used. So simply cutting the text of each documentation comment and pasting it as a string atop the function would be an improvement. Again, this is only a recommendation.

Clarity of the article

The article is excellent. It is easy to understand what was done, which parts of the replication were difficult etc.

Some small issues

While reading the paper and code, I found a few small things (typos etc.) that could be improved:

TL;DR

The replication was successful. The article is excellent. The code is also well structured and easily readable, however there are two problems with reproducibility (lack of random seeds; suboptimal package management) that the authors should fix before publication. In addition, the repo should be cleaned up.

Since one of the two implementation is in Matlab, the editorial team needs to decide whether that is appropriate for publication in ReScience, or whether only the Julia code should be published.

Finally, there are some improvements to the code that I would recommend (reducing duplications, adding spaces around operators, adding docstrings), but these are only suggestions.

otizonaizit commented 4 years ago

Thank you @cJarvers for your review!

@IainDaviesMaths : both reviewers agree that the paper is fit for publication. You just need to address the reviewers comments, especially regarding the random seed and the re-organization of the code. Do you have an estimate when this can happen?

@rougier , @khinsen , @benoit-girard , @oliviaguest : in this submission we have a reproduction with code both in Julia and Matlab. The code in both languages has been reviewed, even if the Matlab code had some difficulties running under Octave. We have had this discussion already, but I think an official statement by the Editors in Chief would be good: do we accept code submissions in Matlab? Given that in this particular case we have corresponding code in Julia, it is maybe worth to relax the submission policy? Should the authors remove the Matlab code from the submission?

oliviaguest commented 4 years ago

@otizonaizit seems like a discussion that deserves it's own thread IMHO!

rougier commented 4 years ago

In this specific case, I think we can relax the recommendation and include the Matlab code as well, expecially if it works under Octave. But I agree with @oliviaguest that we should open a thread to decide on a proper guideline.

IainDaviesMaths commented 4 years ago

Thank you everyone for the reviews and feedback on the paper! Myself and Stephen are in the process of making all the suggested edits and should be done in the next few days.

IainDaviesMaths commented 4 years ago

Most edits have been done now - just a few minor Julia edits to be made which Stephen is on top of. Will alert when the article is in final draft.

IainDaviesMaths commented 4 years ago

Hi all - I think that's all your suggestions added to the code and article now! Let me know if there's anything else to be done.

rougier commented 4 years ago

@otizonaizit 🛎️

otizonaizit commented 4 years ago

@pberkes @cJarvers : I find the reorganized repo and the other changes quite good, so if there are no objections from you two I'd accept the paper and contact @IainDaviesMaths fo the details of the publication process! Please write a comment here in any case.

pberkes commented 4 years ago

@otizonaizit no objections from me!

cJarvers commented 4 years ago

@otizonaizit no objections from me either. The changes address all the issues I raised. Great work by the authors!

otizonaizit commented 4 years ago

Great. Congratulations @IainDaviesMaths : the paper is accepted! I will follow up shortly with a PR to the paper repo for the update of metadata.

IainDaviesMaths commented 4 years ago

@otizonaizit thanks for the info about proceeding, I think it is ready for publication now!

otizonaizit commented 4 years ago

Great! Published:

The article will appear on https://rescience.github.io/read/ as soon as these two PRs are merged by one of the editors in chief (@rougier ?)

sje30 commented 4 years ago

Super, thanks! Well done Iain for getting this great work published.

rougier commented 4 years ago

Should be online now.