ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

[Re] Reproductive pair correlations and the clustering of organisms #58

Closed CoraliePicoche closed 2 years ago

CoraliePicoche commented 2 years ago

AUTHOR

Dear @ReScience/editors,

I request a review for the following replication:

Original article

Title: Reproductive pair correlations and the clustering of organisms \ Authors: W.R. Young, A.J. Roberts & G. Stuhne\ Journal: Nature\ Year: 2001\ DOI: 10.1038/35085561\ PDF: https://www.nature.com/articles/35085561.pdf

Replication

Authors: C. Picoche, W.R. Young, F. Barraquand \ Repository: https://github.com/CoraliePicoche/brownian_bug_fluid \ PDF: https://github.com/CoraliePicoche/brownian_bug_fluid/blob/main/article/article.pdf \ Keywords: Agent-based model ; Turbulence ; Pair correlation function ; Microscale \ Language: C++ / R \ Domain: Computational Ecology

Results

We have fully replicated the article.

Potential reviewers

@pboesu \ Cesar B. Rocha

Suggested editor

Timothée Poisot \ Pierre de Buyl

rougier commented 2 years ago

Thanks for your submission and sorry for the late answer. We'll assign an editor soon.

rougier commented 2 years ago

@pdebuyl @tpoisot Could any of you edit this submission?

CoraliePicoche commented 2 years ago

Hello,

We were wondering if there were any news regarding our submission. Has an editor been assigned yet ?

Thank you for your help.

rougier commented 2 years ago

I'm very sorry, this is entirely my fault. Don't hesitate to post message here if you do not see any progress (this will send me a notification).

@pdebuyl @tpoisot Could any of you edit this submission?

pdebuyl commented 2 years ago

Hi @rougier the submission repo seems to have disappeared. @CoraliePicoche the links give an error, have there been changes to the submission?

rougier commented 2 years ago

@pdebuyl Same for me. @CoraliePicoche Is your repository public ?

CoraliePicoche commented 2 years ago

Hello,

My bad, it seems that I had forgotten this (crucial...) part. This should be ok, now.

My apologies for this mistake.

rougier commented 2 years ago

Works now ! @pdebuyl Can you edit it ?

pdebuyl commented 2 years ago

Yes

rougier commented 2 years ago

@pdebuyl Great and many thanks.

pdebuyl commented 2 years ago

@CoraliePicoche we have one reviewer already, @rajeshrinet . The review will start when I secure a second reviewer.

CoraliePicoche commented 2 years ago

Many thanks for the news, @pdebuyl

FTurci commented 2 years ago

General comments

The article reproduces the results of Young et al. Nature 2001 and provides an additional precious step-by-step derivation of the pair density function.

The authors also provide an interesting discussion of the characteristic scales for the validity of the approach, which reveals some limitations of the theoretical model and suggests possible avenues for future development.

Code

I managed to run the code on my own machine, reproducing the simulations and visualisations of the article.

Αs a minor note, having to edit the Makefile for every figure is a bit inconvenient, so I would suggest having separate targets for different figures.

Paper

Overall the article is very clear and is a good complement to the original paper, with a detailed derivation in the Supplementary Materials.

Minor notes

pdebuyl commented 2 years ago

Hi @FTurci thank you very much for getting to this!

CoraliePicoche commented 2 years ago

Hello @FTurci , Thank you very much for your positive feed back, we will do the corresponding corrections very soon! Best regards Coralie Picoche

rajeshrinet commented 2 years ago

@pdebuyl I have now gone through the paper and the code. It does an excellent job at reproducing the results of Young et al Nature 2001. I particularly appreciate the detailed derivations in the paper and the complimentary well-commented code. Both of these will be very useful for a wider audience. I have included below some minor comments on the code and the paper which the authors may consider.

Code

I could execute the code on my system. Here are some suggestions:

Paper

The paper is very well-written and steps very clearly described. Minor comments on the paper follow:

CoraliePicoche commented 2 years ago

Hello @rajeshrinet, Thank you very much for your comments, we will do the modifications you ask for soon. Best regards Coralie Picoche

pdebuyl commented 2 years ago

Hi @CoraliePicoche any update?

CoraliePicoche commented 2 years ago

Hello @pdebuyl, We have been pretty busy but are currently working on the revision. Sorry for the delay. Best regards

CoraliePicoche commented 2 years ago

Hello @pdebuyl,

We have now corrected the manuscript and improved the clarity of the code based on both the reviews and an issue raised by @rajeshrinet directly in the replication repository. We apologize for the delay.

Following reviewers' advice, we now give more context in the manuscript with several, beginner-oriented references. We have also corrected the formulation of some of the equations which, while still true, could be improved, and corrected a spelling mistake in equation (26).

Regarding the code, we have created a new code/tests directory which allows any user to run similar programs and produce corresponding figures, but with parameters adapted for faster simulations. These simulations should help the user familiarize with the functioning of the code before launching longer simulations to replicate our results. Differences between parameters are given in code/README.md.

We hope that our replication is now a good fit for ReScience.

Best regards,

On behalf of all co-authors,

Coralie Picoche.

++++++++++++++++++

Reviewer 1: @FTurci

General comments

The article reproduces the results of Young et al. Nature 2001 and provides an additional precious step-by-step derivation of the pair density function.

The authors also provide an interesting discussion of the characteristic scales for the validity of the approach, which reveals some limitations of the theoretical model and suggests possible avenues for future development.

We thank @FTurci for his positive feedback.

Code

I managed to run the code on my own machine, reproducing the simulations and visualisations of the article.

Αs a minor note, having to edit the Makefile for every figure is a bit inconvenient, so I would suggest having separate targets for different figures.

We have now added separate makefiles for each script.

Paper

Overall the article is very clear and is a good complement to the original paper, with a detailed derivation in the Supplementary Materials.

Minor notes

* I did not find the explicit definition of lambda and mu. Clearly, they are the death and birth rates, related to the probabilities p and q mentioned in the Methods section. For completeness, lambda, mu and gamma should be defined before or immediately after equation 3.

We have added the definition of lambda and mu immediately after equation 3.

* "brownian" should be "Brownian"

We have corrected the spelling of Brownian throughout the manuscript and Supplementary.

++++++++++++++++++

Reviewer 2: @rajeshrinet

@pdebuyl I have now gone through the paper and the code. It does an excellent job at reproducing the results of Young et al Nature 2001. I particularly appreciate the detailed derivations in the paper and the complimentary well-commented code. Both of these will be very useful for a wider audience. I have included below some minor comments on the code and the paper which the authors may consider.

We thank @rajeshrinet for his comments and suggestions.

Code

I could execute the code on my system. Here are some suggestions:

* Please add a [LICENSE to the repo](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository)

We have chosen a GPLv3 license and added the corresponding file to our repository.

* Please add a [CONTRIBUTING file](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors)

We did not find many examples of such contributing file adapted to the context of Rescience (only here and here). Due to the brevity of the previous examples and of our own contribution recommendation, we have chosen to keep the contribution section inside the main README.

* In addition, it would be great to populate the README.md on the landing page with details on what do the folders contain, etc. The authors may also consider adding a .gif file from their simulation to the README.md.

We have extended the main README: we now describe the structure of all directories as well as the main requirements to execute the code, which were previously in code/README.md. As mentioned above, we have also included a contribution section.

Regarding .gif files, while we think that this is a good idea to visualize the movement of particles, we fear that this could disturb the reader. We have therefore added said files in the directory and referred to them in the main README, but do not directly show them.

Paper

The paper is very well-written and steps very clearly described. Minor comments on the paper follow:

* Please provide labels to the x-axis in all the panels in Fig.1. The tick sizes and axis labels in Fig.3 are small.

Figures have been improved and are now in the manuscript.

* I suggest adding some more citations since one of the aims of the paper is to _make this replication article more accessible to most readers_. For example, authors should cite standard texts for stochastic simulations, turbulence, Reynolds number, etc. For example, the names of Kolmogorov and Batchelor appear without any fixed citation.

We have now added general references regarding turbulence (A first course in turbulence, Tennekes et al. 1972), especially applied to small organisms such as phytoplankton (Living at microscale, Dusenbery 2009) in the introduction. Both books target naive readers and are sufficient to understand the phenomena described in our model. Such context is also given in Peters and Marrasé (2000), already cited in the first version of the manuscript, where the reader can also find a definition of the Kolmogorov scale. We would however like to point out that Kolmogorov's name is used here to qualify a known physical quantity (scale relevant to characterize a flow), not to refer to a singular work we could cite.

pdebuyl commented 2 years ago

Hi @CoraliePicoche sorry I did not notice the update. I am off for a week and will complete the procedure for your article afterwards.

CoraliePicoche commented 2 years ago

Hello @pdebuyl . No worries, thanks for the update.

pdebuyl commented 2 years ago

Hello @CoraliePicoche main_Fig3.out is running since more than 48h. Can you tell me how long it should take? I am running on a Xeon 2.8GHz (so, reasonable I guess) using the provided makefile.

CoraliePicoche commented 2 years ago

Hello @pdebuyl , Are you running the "tests" simulations (faster ones, in code/tests) or the original version of the code? Which value did you use for Utot?

pdebuyl commented 2 years ago

I run ./main_Fig3.out from the code/simulation directory with the values 0.0, 0.1, 0.5, and 2.5

CoraliePicoche commented 2 years ago

If you are running them sequentially, 48h does not seem too bad (I must admit that this code is clearly not optimized), I would advise to wait until tomorrow. However, if you are running the different values on different processors in parallel, this seems a bit long from what I remember on my own configuration (Xeon 3.5Hz). If you still have computing resources available, could you tell me if the test simulations for Figure 3 work properly with your configuration?

pdebuyl commented 2 years ago

It is running on a headless server, so it does not bother me to let it run. It runs as four independent processes though...

CoraliePicoche commented 2 years ago

I have checked on my own server: computation time was around 20h for a single value of Utot (again, on a slightly "better" processor). How long does it take for the smaller simulations?

CoraliePicoche commented 2 years ago

Hello @pdebuyl , Do you have any update regarding simulation duration? Have they ended on your server?

pdebuyl commented 2 years ago

Hello @CoraliePicoche

I have an update but am a bit worried about the failure to run the code.

First, from code/simulation the code ran for 5 days without completing.

I am trying now the files in code/tests/simulation and the code returns a segmentation fault except if I compile with the optimization flag -O0.

As you and the reviewers could run the code, it likely depends on the platform, but such errors have an impact on the readers of the reproduction, who could have the same bug. The code is too slow to run without optimizations.

From the terminal output, it seems that the code spends most of its time in PairDens and indeed, the pair loop is slow.

Do you have any idea on how to resolve it? Ideally the code should run without the segmentation fault...

I am using g++ 10.2.1 on linux x86_64.

@FTurci @rajeshrinet could you let us know your test platform?

CoraliePicoche commented 2 years ago

Hello @pdebuyl, I am sorry you encountered theses issues, which I am going to investigate. I will try to get back to you shortly. Thanks for the feedback!

CoraliePicoche commented 2 years ago

Hello @pdebuyl,

I have tried running the code on several machines, and I think I have found the issue. A previous version of the function PairDens returned a double, which is not the case anymore as variables are directly modified in the function. However, the function was still coded as returning a variable. With previous versions of g++, this did not cause any issue but it seems that newer versions are more sensitive to this, leading to simulations taking much more time to run.

Here are the different configurations I used to test the code. I was able to launch full simulations (with code/simulation/main_Fig3.cpp) on all of these without further issue, and test simulations (with code/tests/simulation/main_Fig3.cpp) always ended in less than 2 minutes.

Local computers

For both local computers, simulations were run with PairDens returning no value, i.e., with the modification previously mentioned.

Laptop

Processor: 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz \ Linux version: 5.13.0-39 [Ubuntu 20.04]\ GSL version: 2.5\ G++ version: 9.4.0\ Time to run longer simulations: 15h

Desktop computer

This computer was originally the one used for all simulations presented in the paper.

Processor: Intel® Xeon(R) CPU E3-1240 v5 @ 3.50GHz \ Linux version: 4.15.0-133 [Ubuntu 16.04]\ GSL version: 2.6\ G++ version: 5.4.0\ Time to run longer simulations: 21h

Regional cluster

I work with the 'Mésocentre de Calcul Intensif' (MCIA), the reference cluster for scientific computing in Nouvelle Aquitaine (France). The time mentioned below corresponds to the earlier version of PairDens, when the function was still supposed to return a double. Simulations might be shorter with PairDens returning nothing.

Processor: Intel® Xeon® Gold SKL-6130 @ 2,1 GHz\ Linux version: 3.10.0 [Red Hat]\ GSL version: 1.15\ G++ version: 4.8.5\ Time to run longer simulations: 44h

I was not able to reproduce the segmentation fault you obtained running file code/tests/simulation/main_Fig3.cpp in any of these configurations, even before correcting the PairDens function. Could you try and recompile/run file code/tests/simulation/main_Fig3.cpp, using the command line make -f makefile_Fig3 for compilation, to see if you still obtain a seg fault or if the problem has now been solved through the correction of the PairDens function?

Best regards\ Coralie Picoche

pdebuyl commented 2 years ago

Hi thanks for the update! I could run the tests successfully now on my laptop. I'll check the longer runs as well. Is 21h for the runs the one after the other or all at once?

CoraliePicoche commented 2 years ago

I am glad it worked! As for the longer runs, 21h corresponds to a single simulation, but you can of course run them all at once on different processors.

pdebuyl commented 2 years ago

Hi @CoraliePicoche I ran all with success thanks for the fix. The figures are as expected :-) Before publishing the article, can you add the DOIs to the bib file and provide the platform information (see https://rescience.github.io/platform/) in the repo? Else, I'd like to apologize for the delays in the editorial procedure and say that the paper is very clearly written.

pdebuyl commented 2 years ago

PS: adding the DOIs will provide clickable links to the references and (I guess) allow for better bibliographical referencing.

CoraliePicoche commented 2 years ago

Hello @pdebuyl,

I have added the DOIs to the bib file and recompiled the .pdf, the new versions are now online.

Regarding the platform information, should I mention only the main one (the "Desktop computer" mentioned above, which was used to run all simulations), or all platforms which have been used to (successfully) test the code (which would be my laptop and regional cluster, here, maybe even your own configuration)?

Thanks for you help Coralie

pdebuyl commented 2 years ago

Hello, thanks for the update. Regarding the platform, only the desktop then. It is supposed to be the computer that was used to produce the figures in your pdf. The instructions are only for Python and C/C++. In your case, there should be the g++ version, with the flags, and the version of R and of the package expint.

CoraliePicoche commented 2 years ago

Hello @pdebuyl , I have added the information on the platform in code/README.md , could you confirm that this corresponds to your requirements? If we are close to the end of the review process, should I change the "Edited by", "Received", etc. sections of the article presentation now, or will you do it? Will I be able to modify the citation recommendation at the end of the main README once a DOI has been attributed? Best regards Coralie

CoraliePicoche commented 2 years ago

Hello @pdebuyl, Is there any further information we should provide to help in the review process? Thank you for your help. Coralie

rougier commented 2 years ago

@pdebuyl Is there anything I can do to help?

pdebuyl commented 2 years ago

Hello, I will edit the metadata (editor, dates, etc) for publication. Sorry for the delay, very busy week. It's been some time since I edited for ReScience so I have to check all the procedure again.

pdebuyl commented 2 years ago

Hi @rougier thanks for passing by. I should be able to proceed but will ping you if necessary :-)

pdebuyl commented 2 years ago

@CoraliePicoche can you archive the code? There should be a zenodo or software heritage archive of the repository. I have a pull request open for the metadata.

pdebuyl commented 2 years ago

@FTurci and @rajeshrinet can you check your ORCID in the PDF here: https://github.com/pdebuyl/brownian_bug_fluid/blob/main/article/article.pdf

pdebuyl commented 2 years ago

@CoraliePicoche I still need the zenodo or software heritage archive of the repository.

Then, I'll publish the paper. The DOI will be 10.5281/zenodo.6546488 (it does not resolve yet, see procedure at https://github.com/ReScience/articles if you want more details).

CoraliePicoche commented 2 years ago

@CoraliePicoche can you archive the code? There should be a zenodo or software heritage archive of the repository. I have a pull request open for the metadata.

The code is now archived and I have added the zenodo DOI to the main README. Should I also add it to the metadata.yaml?

pdebuyl commented 2 years ago

No, I will do it.

pdebuyl commented 2 years ago

Well, it does not matter actually sorry :-) Any of us can do it. It is all that is missing

pdebuyl commented 2 years ago

if you do it, there is one less "pull request" to do so it is easier I guess