Closed LukasWallrich closed 1 year ago
Happy to take this over! 😊
@marieke-woensdregt has accepted to be one of the reviewers — awesome and thank you! ☺️
Hi! Happy to help!
@LukasWallrich do you have any ideas/suggestions for a 2nd reviewer?
@LukasWallrich do you have any ideas/suggestions for a 2nd reviewer?
Unfortunately, I don't - this is my first foray into computational work, and I don't yet know many people in this space. If necessary, I can, of course, do some research ... but any suggestions would be based on Googling rather than actual judgements of whether someone would be suitable.
Hi both, I'm planning to have a look at this tomorrow. I can probably suggest a couple of potential reviewers then. Please do let me know if either of you has considerations you would like me to take into account when thinking about potential reviewers!
@marieke-woensdregt sounds great, if you can!
@marieke-woensdregt no undue pressure, but any idea for an ETA for the review? 😊
@LukasWallrich I apologise for this taking so long. Have you had a chance to maybe think of any potential reviewers to help me out? I'm struggling to find anybody able to do this for us. 😌
Dear @LukasWallrich and @oliviaguest, I am so sorry for the radio silence. I was a bit overwhelmed with other work these past months, but will definitely get to this by the end of this week. My sincere apologies for the delay.
I also thought of some other potential reviewers:
These are all people who mostly work on language and cultural evolution, because that happens to be my field, but they are also definitely interested in population-level processes more generally, and all do agent-based modelling and simulations using Python. Hope that helps!
Thank you to both of you! I have also been very busy and thus not found the time to follow up ... so I am glad that this is moving again. @oliviaguest would you be able to invite these possible reviewers? I am meeting someone later this week who might have ideas, so I will try again.
I'm sorry that I can't be of more use in finding reviewers. However, what I can do is volunteer to review #64 if helpful? I have some background in educational policy, so I at least understand the substance well.
I'm finally sitting down for this now!
@marieke-woensdregt fantastic!
@LukasWallrich yes, I can invite them — of course. Please send me names if you have any handy. ☺️
@marieke-woensdregt I hope you are well. Any update? I hope you did not run into a roadblock when you sat down for this back in April?
Hi @LukasWallrich, My sincere apologies! I have now agreed with @oliviaguest that I will finish my review by the 1st of June.
@oliviaguest I finally got a few names for potential reviewers. I don't know any of them - they were suggested by Francesco Rigoli who convenes a network on computational political psychology. Would you want to invite them or should I reach out? For peer review, I thought it make sense if that comes from you but happy to follow your guidance (incidentally, there might also be suitable reviewers here for #64?)
• dimitri ognibene: dimitri.ognibene@unimib.it • sven banisch: sven.banisch@UniVerseCity.de • michael moutoussis: @mmoutou here • geert-jan will: g.j.will@fsw.leidenuniv.nl • david young: @davidjyoung here • lion schulz: lion.schulz@tue.mpg.de
Hi @LukasWallrich and @oliviaguest,
I am close to finishing my review of this replication. I am just waiting to see if the simulation results come out the same if I re-run the simulations myself. I should be able to send you my review soon!
By the way, how shall I share my review @oliviaguest? Should it be open, e.g. by attaching it as a file to a message in this GitHub thread?
By the way, how shall I share my review @oliviaguest? Should it be open, e.g. by attaching it as a file to a message in this GitHub thread?
Plain text in this thread would be great. See examples here when you click on "review" http://rescience.github.io/read/ — hope that helps. 😊
Sorry you were tagged in an unclear way, @mmoutou. I had hoped for a list of names and not GitHub tags (because it causes this confusion).
In the interests of transparency/clarity, these comments are part of an open review and visible to others. If you would like to decline/stop receiving notifications, please use the unsubscribe option. Apologies for the confusion.
Yeah that was quite unclear - I didn't realise that even if I didn't log in to github, the contents of my email would be public. There really should be info about this visibility very prominent in the email copy that's sent. However I'm still confused - I presume all this means that @oliviaguest will let me know if I am invited as a reviewer. Presumably the review would include Comments to the Editor and Comments to the Authors, etc., so it wouldn't just be posted here???
@mmoutou if you want to you may delete/edit the email/comment above. It would be wonderful if you wanted to review. Do you have the time and feel up for it?
To be clear, this is not a traditional journal. The review completely takes place here, on GitHub, in public. You may look at examples and instructions here: http://rescience.github.io/ Let me know if you have further questions, @mmoutou! 😌
@mmoutou - good to e-meet you and both to you and to @oliviaguest apologies for the confusion that I caused! I wanted to be helpful by pulling out the GH names - but clearly didn't think far enough. Sorry.
I'd still appreciate it if you could contribute to the review process.
@marieke-woensdregt let me know if you need help attaching what you have so far..? 😊
@marieke-woensdregt @mmoutou 😊 hey, can you both check this thread and reply to the above few replies, please? ☝️
Apologies, I would very much like to contribute to this but it really doesn't look as if I'll manage to make the time for it. Things not great at my end. Sorry Michael
On 27/06/2022 08:29, Olivia Guest wrote:
⚠ Caution: External sender
@marieke-woensdregt https://github.com/marieke-woensdregt @mmoutou https://github.com/mmoutou 😊 hey, can you both check this thread and reply to the above few replies, please? ☝️
— Reply to this email directly, view it on GitHub https://github.com/ReScience/submissions/issues/61#issuecomment-1166986322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZPCQCUAR2LGSMD5E3N2LVRFJ5ZANCNFSM5JAE7ZHQ. You are receiving this because you were mentioned.Message ID: @.***>
With my sincere apologies for the endless delay (I kept running into minor issues with my reproduction and then not finding the time to resolve them), below finally my complete review. I've structured it according to the 4 questions to reviewers that are specified on the Rescience C website (points 1-4 below). I'm attaching a zip folder containing all the relevant files I used to reproduce the results.
Overall, the results presented in this article show a clear replication of the original results by Hong & Page (2004) and Grim et al. (2019). Where there are differences in the results between this replication and the original articles, the author addresses these differences adequately with detailed further analysis, and provides convincing explanations.
a. This is very minor, and I don’t think it would make a difference, but Grim et al. (2019) actually seem to use populations of 9 agents, rather than 10?
b. In Section 3.1, where it says: “For instance, the standard deviation of the performance of groups of 10 highest‐ability agents with l = 12 observed here was 0.56 while Hong and Page report 0.020.”, I think this should be 1.26 instead of 0.56.
c. Regarding the difference in diversity values for the high-ability agents in this replication compared to the original Hong & Page article: I wonder whether the author has attempted to contact Hong & Page to ask about this? I’m asking because diversity is the notion that all this work revolves around, and because I read in the guidelines of Rescience C that contacting the original authors when differences are found is encouraged. That said, I do realise that the original article was published nearly 20 years ago, so I’m not sure how likely it is that Hong & Page will be able to dig up the original code (assuming they are willing). Furthermore, the author does provide some convincing evidence for this difference in diversity values not being indicative of an implementation difference.
d. In Figure 2, the range of the scale is quite different from Figures 6 and 9 in Grim et al.: ca. -0.5 – 3.5 in the replication, vs. -0.04 – 0.01 in Grim et al. Does the author have an explanation for this? Does it have something to do with the larger range of l values in the replication?
I reproduced this replication by running the run_simulation.py script from the Hong_and_Page folder, and the run_simulation_sweep.py script from the Grim_etal folder, on the computer cluster of my institute. In both cases, the results of my reproduction are very similar to those reported in the replication paper (and accompanying Jupyter notebooks). I’m attaching the pickle files containing my reproduced simulation results, as well as the corresponding Jupyter notebooks showing the analysis results for my reproduction. (The files containing the results of my reproduction can be recognized by the MW or _MW prefix or suffix.)
a. I also tried running a simulation using Google Cloud Engine, using the pyscript2gce helper that the author refers to in the README.md, but I got stuck at the step of building a Docker image, which led to the following error when I tried to run the simulation: ERROR: (gcloud.compute.instances.create-with-container) Missing required argument [--container-image]: You must provide container image
b. For running the code using the pyscript2gce helper, it was also a bit unclear to me what code would need to be added on line 26 of main.py, because at the bottom of that same script it seems to already call up the simulation.py script.
c. It would be useful to add examples of how to pickle (save) the data at the bottom of each run_simulation.py script. I had to add lines like the following myself, at the bottom of each simulation script: out.to_pickle("MW_HPmodelresults"+ datetime.now().strftime("%Y-%m-%d-%H-%M-%S")+".pkl")
d. In the run_simulation_sweep.py script, I changed the line: “l”: range(4, 10) to: “l”: range(4, 31) Because if I understood it correctly, this was the settting actually used by the author to produce the simulation results for the paper.
e. In run_simulation_sweep.py, I also changed: iterations=1 to: iterations=100 Again because I believe that’s the setting the author actually used to producer the results for the paper.
f. With my initial simulation results, I ran into a mismatch with the pickle file protocol that was expected by the Jupyter notebooks for analysis. (If I remember correctly, it was: unsupported pickle protocol: 5). If I remember correctly, I solved this issue by changing the version of pandas in the virtual machine I used on my institute’s computer cluster to run these simulations. After changing pandas to version 1.3.5, everything worked smoothly. Maybe the author can add this type of version information somewhere (e.g. in requirements.txt ?)
g. In the analysis.ipynb file for the Grim_et_al model, I ran into the following error in the code cell that calculates the win_comps table, as well as the final code cell of the document: AttributeError: 'float' object has no attribute 'round' I replaced all calls to .round() as a method to round() as a function instead. This solved the issue.
Overall, the code is clear and well-structured, and the instructions for running it in the README.md file are also clear (although see my comments on the pyscript2gce helper above). I had a first example simulation of the Hong & Page model running on my own laptop within a matter of minutes. Below are some comments that I think will help make the code and README.md file even clearer:
a. In the README.md file, where it says “The analyses to replicate Hong & Page can feasibly be run on a laptop in a matter of hours” etc., I think the author means simulations rather than analyses (this phrasing was confusing to me, because it looked like it was referring to the jupyter notebooks that contain the analysis scripts, which in fact run in a matter of minutes).
b. It would be easier to interact with the folders through the terminal if they didn’t have spaces in the folder names (use e.g. underscores instead).
c. Docstrings are present for all methods, but relatively minimal. Consider adding a more structured specification of the input arguments and what each method returns (or what changes it makes to an object’s attributes). For widely-used standardised docstring formats, see e.g.: https://realpython.com/documenting-python-code/#docstring-formats ).
d. In general I would appreciate some more comments in the class and method definitions that walk the reader through the steps that are being taken (e.g. in long methods definitions like draw_agents() and max_search() in HPmodel.py).
e. Given that not everyone will be familiar with the mesa package, perhaps the author could add a few words on this in the README.md. Just to explain what mesa is and how it is (broadly) used in these scripts. E.g., a brief explanation of what the Agent, Model, BaseScheduler, and BatchRunnerMP classes are useful for.
f. It is not immediately clear what the parameters “iterations” and “max_steps” in the simulation scripts refer to. Is “iterations” the number of landscapes? If so, what is “max_steps”? (I worry the latter might be confused with the parameter l). Would be good to add comments to the code to explain what each of the parameters are.
g. Perhaps the author could add a comment to explain how the nr_processes parameter of the BatchRunnerMP() class can be used?
h. Further to my point above, I initially ran the run_simulation_sweep.py script for the Grim et al. model with the parameter setting: "strategy": "both" (which was the default setting given in the script). But then it turned out that Section 2 of the analysis Jupyter notebook from Grim_et_al expects a dataframe with only the “relay” strategy results. (Whereas Section 3 indeed expects a dataframe containing both strategies.) The Jupyter notebook got stuck at code cell 5 (the prep_outputs(res) function) when having to deal with a dataframe that contains results from both strategies (and therefore four columns, titled: 'relay_random_solution', 'relay_best_solution', 'tournament_random_solution', 'tournament_best_solution', and an extra column titled ‘strategy’, rather than just two columns titled 'random_solution', 'best_solution'). (I then ran a separate simulation with the setting “strategy”: “relay”, to use in Section 2 of the analysis notebook.) In other words, comments to explain what the possible settings of the parameters are, and what settings the analysis scripts can deal with, would be really helpful. (And/or a more general analysis script that can deal with all possible parameter settings.)
Overall, the author has done an impressive job at summarizing the two original models and the replications clearly and concisely. The author has also added more detailed analysis here and there (e.g. the SD’s in Figure 1, and the addition of Table 2), which helps gain more insight into the workings of the model. Below are some minor comments that I think will help make the paper more clear and self-contained.
a. In the first paragraph of the subsection titled “The basic model”, it might help to make a bit more explicit that the ring of 2,000 values represents 2,000 different possible solutions to a given problem, and that the values represent the quality of the solutions. It might also help to give a concrete example, as Hong & Page’s original article did with the gasoline engine designs.
b. I’m assuming that the last line of that same paragraph: “When they are in a group, they move together, with each agent in turn moving the entire group to the greatest value their heuristic can identify.” describes the discussion/conversation process that Hong & Page incorporated in their model (which Grim et al. call the “round-the-table relay dynamic”)? However, this is not entirely clear to me; I think it would help to put it in these discussion/conversation terms more explicitly (to help make the link between this paper and the original papers).
c. In the subsection titled “Grim and colleagues’ extension and qualification”, I think it would help to emphasise a bit more that different problems are represented by different landscapes, and that Grim et al. used the smoothing procedure in order to make different problem landscapes more related/similar to each other (i.e. to increase the “transportability of best performing heuristics.”). I worry it might otherwise be unclear to the reader how relatedness between nearby values in one problem landscape would have anything to do with other problems, and that they might end up misunderstanding what represents a problem in this model (an entire landscape), and what represents a solution (one point in that landscape).
d. In the paragraph titled “Implementation”, it says that “most results here are based on 500 landscapes.”. Why not be more specific and say that the Hong & Page replication relied on 500 landscapes, while the Grim et al. replication relied on 100 landscapes?
e. In the caption of Table 1, please specify what the numbers between parentheses are (presumably standard deviations, as in Hong & Page).
f. The caption of Figure 1 could do with an explanation of what numbers above 0 and numbers below 0 mean (or a legend + colour difference, as in Grim et al.’s Figure 2).
g. In the caption of Figure 2, it might be helpful to add “(bigger maximum step length means larger heuristics pool)” at the end of the first sentence.
h. Where it says “The results closely replicate those presented in Figures 5 to 9 in Grim and colleagues.” On page 5, I think the author means Figures 6 and 9.
Thank you so much for this clear, comprehensive and helpful review! Definitely worth the wait :) I will implement the changes as soon as I can.
Apologies, I would very much like to contribute to this but it really doesn't look as if I'll manage to make the time for it. Things not great at my end. Sorry Michael
Sorry to hear @mmoutou... I wish you all the best and that things improve as soon as possible. 🌼
Thank you again for this very thorough review – much appreciated. I have now implemented them and updated both the article and the code. Below are my responses to each point.
Overall, the results presented in this article show a clear replication of the original results by Hong & Page (2004) and Grim et al. (2019). Where there are differences in the results between this replication and the original articles, the author addresses these differences adequately with detailed further analysis, and provides convincing explanations.
a. This is very minor, and I don’t think it would make a difference, but Grim et al. (2019) actually seem to use populations of 9 agents, rather than 10?
This is true. Grim et al do not explain that divergence at all, and I thought it would be more helpful to use a consistent team size across the two models. I added a footnote explaining this - would you like me to also report results for teams of 9?
b. In Section 3.1, where it says: “For instance, the standard deviation of the performance of groups of 10 highest‐ability agents with l = 12 observed here was 0.56 while Hong and Page report 0.020.”, I think this should be 1.26 instead of 0.56.
Yes, thanks.
c. Regarding the difference in diversity values for the high-ability agents in this replication compared to the original Hong & Page article: I wonder whether the author has attempted to contact Hong & Page to ask about this? I’m asking because diversity is the notion that all this work revolves around, and because I read in the guidelines of Rescience C that contacting the original authors when differences are found is encouraged. That said, I do realise that the original article was published nearly 20 years ago, so I’m not sure how likely it is that Hong & Page will be able to dig up the original code (assuming they are willing). Furthermore, the author does provide some convincing evidence for this difference in diversity values not being indicative of an implementation difference.
I assumed that the paper had been published too long ago – but you are right that it is still worth trying. So I emailed Lu Hong, and got a quick response from Scott Page. He no longer has access to the original code but pointed me towards another replication of the model (Singer, 2019) that reaches the same values for the diversity of the high-ability teams as I did. In a subsequent email conversation with Scott Page and Daniel J. Singer, we agreed that it appears likely that there was a mistake in the original paper.
Regarding the standard deviations, Scott Page clarified that they presented “standard deviations of the mean” – i.e. standard errors. I now also explain that in the paper and added a footnote showing that our results are comparable when taking this into account. In that footnote, I also explain why I still show standard deviations rather than standard errors in Table 1.
d. In Figure 2, the range of the scale is quite different from Figures 6 and 9 in Grim et al.: ca. -0.5 – 3.5 in the replication, vs. -0.04 – 0.01 in Grim et al. Does the author have an explanation for this? Does it have something to do with the larger range of l values in the replication?
Good catch. Grim et al. report performance between 0 and 1 in assessing their model, while they report performance from 1 to 100 in their discussion of Hong and Page. This is not in line with what their text where they say that they average the value of the final heights (which are between 1 and 100) – and it seems to make sense to report outcomes in consistent units throughout this paper. I now explain the divergence in a footnote.
Reproducibility of the replication. I reproduced this replication by running the run_simulation.py script from the Hong_and_Page folder, and the run_simulation_sweep.py script from the Grim_etal folder, on the computer cluster of my institute. In both cases, the results of my reproduction are very similar to those reported in the replication paper (and accompanying Jupyter notebooks). I’m attaching the pickle files containing my reproduced simulation results, as well as the corresponding Jupyter notebooks showing the analysis results for my reproduction. (The files containing the results of my reproduction can be recognized by the MW or _MW prefix or suffix.)
a. I also tried running a simulation using Google Cloud Engine, using the pyscript2gce helper that the author refers to in the README.md, but I got stuck at the step of building a Docker image, which led to the following error when I tried to run the simulation: ERROR: (gcloud.compute.instances.create-with-container) Missing required argument [--container-image]: You must provide container image
Thanks for trying this out as well. How did you get stuck when building the Docker image? Without the image, the next step indeed can’t work.
b. For running the code using the pyscript2gce helper, it was also a bit unclear to me what code would need to be added on line 26 of main.py, because at the bottom of that same script it seems to already call up the simulation.py script.
If you use the release for the ABM model, rather than the general pyscript2gce
helper, you don’t need to make any edits to main.py – I have clarified that in the replication README.
c. It would be useful to add examples of how to pickle (save) the data at the bottom of each run_simulation.py script. I had to add lines like the following myself, at the bottom of each simulation script: out.to_pickle("MW_HPmodelresults"+ datetime.now().strftime("%Y-%m-%d-%H-%M-%S")+".pkl")
Good point. This is not needed when running the code with pyscript2gce
, so I have wrapped it in a conditional clause and added it to the scripts.
d. In the run_simulation_sweep.py script, I changed the line: “l”: range(4, 10) to: “l”: range(4, 31) Because if I understood it correctly, this was the setting actually used by the author to produce the simulation results for the paper.
e. In run_simulation_sweep.py, I also changed: iterations=1 to: iterations=100 Again because I believe that’s the setting the author actually used to produce the results for the paper.
Indeed, thank you! I made these changes.
f. With my initial simulation results, I ran into a mismatch with the pickle file protocol that was expected by the Jupyter notebooks for analysis. (If I remember correctly, it was: unsupported pickle protocol: 5). If I remember correctly, I solved this issue by changing the version of pandas in the virtual machine I used on my institute’s computer cluster to run these simulations. After changing pandas to version 1.3.5, everything worked smoothly. Maybe the author can add this type of version information somewhere (e.g. in requirements.txt ?)
Did you change it from an older version to 1.3.5? Or did you have to revert to 1.3.5? For now, I have changed the requirements.txt
to expect 1.3.5 or greater. Expecting a specific situation would seem likely to cause trouble for users who do not use virtual environments?
g. In the analysis.ipynb file for the Grim_et_al model, I ran into the following error in the code cell that calculates the win_comps table, as well as the final code cell of the document: AttributeError: 'float' object has no attribute 'round' I replaced all calls to .round() as a method to round() as a function instead. This solved the issue.
Weird. The method still works for me, but I have replaced it with the function since that seems more robust.
Clarity of the code and the instructions for running it. Uncommented or obfuscated code is as bad as no code at all. Overall, the code is clear and well-structured, and the instructions for running it in the README.md file are also clear (although see my comments on the pyscript2gce helper above). I had a first example simulation of the Hong & Page model running on my own laptop within a matter of minutes. Below are some comments that I think will help make the code and README.md file even clearer:
a. In the README.md file, where it says “The analyses to replicate Hong & Page can feasibly be run on a laptop in a matter of hours” etc., I think the author means simulations rather than analyses (this phrasing was confusing to me, because it looked like it was referring to the jupyter notebooks that contain the analysis scripts, which in fact run in a matter of minutes).
I clarified this.
b. It would be easier to interact with the folders through the terminal if they didn’t have spaces in the folder names (use e.g. underscores instead).
Done.
c. Docstrings are present for all methods, but relatively minimal. Consider adding a more structured specification of the input arguments and what each method returns (or what changes it makes to an object’s attributes). For widely-used standardised docstring formats, see e.g.: https://realpython.com/documenting-python-code/#docstring-formats ).
I have expanded the Docstrings, generally following the Google Python style guide for the main methods that are to be called externally. I have also added type hints to all arguments and return values. However, I have refrained from repeatedly documenting the same arguments (n, k, l, N_agents) that are covered in the init Docstrings. I have also avoided expanding the Docstrings for brief self-explanatory (and essentially internal) functions, where the type hints should have added some further clarity. I hope the current version balances clarity with brevity … please let me know if you would prefer more detail.
d. In general I would appreciate some more comments in the class and method definitions that walk the reader through the steps that are being taken (e.g. in long methods definitions like draw_agents() and max_search() in HPmodel.py).
I have added some comments to all long methods, and hope that they work together with the docstrings to make the code clear.
e. Given that not everyone will be familiar with the mesa package, perhaps the author could add a few words on this in the README.md. Just to explain what mesa is and how it is (broadly) used in these scripts. E.g., a brief explanation of what the Agent, Model, BaseScheduler, and BatchRunnerMP classes are useful for.
f. It is not immediately clear what the parameters “iterations” and “max_steps” in the simulation scripts refer to. Is “iterations” the number of landscapes? If so, what is “max_steps”? (I worry the latter might be confused with the parameter l). Would be good to add comments to the code to explain what each of the parameters are.
g. Perhaps the author could add a comment to explain how the nr_processes parameter of the BatchRunnerMP() class can be used?
All done - thanks.
h. Further to my point above, I initially ran the run_simulation_sweep.py script for the Grim et al. model with the parameter setting: "strategy": "both" (which was the default setting given in the script). But then it turned out that Section 2 of the analysis Jupyter notebook from Grim_et_al expects a dataframe with only the “relay” strategy results. (Whereas Section 3 indeed expects a dataframe containing both strategies.) The Jupyter notebook got stuck at code cell 5 (the prep_outputs(res) function) when having to deal with a dataframe that contains results from both strategies (and therefore four columns, titled: 'relay_random_solution', 'relay_best_solution', 'tournament_random_solution', 'tournament_best_solution', and an extra column titled ‘strategy’, rather than just two columns titled 'random_solution', 'best_solution'). (I then ran a separate simulation with the setting “strategy”: “relay”, to use in Section 2 of the analysis notebook.) In other words, comments to explain what the possible settings of the parameters are, and what settings the analysis scripts can deal with, would be really helpful. (And/or a more general analysis script that can deal with all possible parameter settings.)
Sorry about that - sounds like this made you waste a fair bit of time :/ I now clarified the need to use the "relay" and changed the default in the simulation script. Rather than making the code more complex here by adding conditions to see whether "both" strategies were run, I now point out that the code to prepare such data for analysis is available in the next section of the analysis script.
Clarity and completeness of the accompanying article, in which the authors should clearly state why they think they have replicated the paper (same figures, same graphics, same behavior, etc.) and explain any obstacles they have encountered during the replication work. The reviewers should also consider if the article is sufficiently self-contained. Overall, the author has done an impressive job at summarizing the two original models and the replications clearly and concisely. The author has also added more detailed analysis here and there (e.g. the SD’s in Figure 1, and the addition of Table 2), which helps gain more insight into the workings of the model. Below are some minor comments that I think will help make the paper more clear and self-contained.
Thank you for this positive feedback and the helpful notes below.
a. In the first paragraph of the subsection titled “The basic model”, it might help to make a bit more explicit that the ring of 2,000 values represents 2,000 different possible solutions to a given problem, and that the values represent the quality of the solutions. It might also help to give a concrete example, as Hong & Page’s original article did with the gasoline engine designs.
I have added an example.
b. I’m assuming that the last line of that same paragraph: “When they are in a group, they move together, with each agent in turn moving the entire group to the greatest value their heuristic can identify.” describes the discussion/conversation process that Hong & Page incorporated in their model (which Grim et al. call the “round-the-table relay dynamic”)? However, this is not entirely clear to me; I think it would help to put it in these discussion/conversation terms more explicitly (to help make the link between this paper and the original papers).
I have attempted to clarify this dynamic but would prefer to avoid referring to communication (given that communication seems to entail the possibility of misunderstandings). Could you have a look to see if you find the current section sufficiently clear?
c. In the subsection titled “Grim and colleagues’ extension and qualification”, I think it would help to emphasise a bit more that different problems are represented by different landscapes, and that Grim et al. used the smoothing procedure in order to make different problem landscapes more related/similar to each other (i.e. to increase the “transportability of best-performing heuristics.”). I worry it might otherwise be unclear to the reader how relatedness between nearby values in one problem landscape would have anything to do with other problems, and that they might end up misunderstanding what represents a problem in this model (an entire landscape), and what represents a solution (one point in that landscape).
I tried to clarify this.
d. In the paragraph titled “Implementation”, it says that “most results here are based on 500 landscapes.”. Why not be more specific and say that the Hong & Page replication relied on 500 landscapes, while the Grim et al. replication relied on 100 landscapes?
I clarified this - the main replication of Grim et al. also used 500 landscapes, but the parameter sweep would have taken too long.
e. In the caption of Table 1, please specify what the numbers between parentheses are (presumably standard deviations, as in Hong & Page).
Yes, they are standard deviations - unlike in Hong & Page, it turns out.
f. The caption of Figure 1 could do with an explanation of what numbers above 0 and numbers below 0 mean (or a legend + colour difference, as in Grim et al.’s Figure 2).
g. In the caption of Figure 2, it might be helpful to add “(bigger maximum step length means larger heuristics pool)” at the end of the first sentence.
h. Where it says “The results closely replicate those presented in Figures 5 to 9 in Grim and colleagues.” On page 5, I think the author means Figures 6 and 9.
All done, thanks!
@marieke-woensdregt no pressure as I know you are on holiday, but when you are back an ETA to check this might be useful.
@LukasWallrich I am still struggling to get any reviewers here, apologies.
@thelogicalgrammar are you still able and interested in reviewing this? 😊
OK, so @thelogicalgrammar said yes, but is on holiday. I cannot assign them until they comment on this thread, however, so we will have to wait. 😊
Hi @LukasWallrich and @oliviaguest, I will have a look at Lukas' revisions later today. And good to hear that you found a second reviewer!
Excellent news indeed - thanks, everyone!
@marieke-woensdregt would you be OK with a September ETA to take a look here? ☺️
Yes, sorry! Definitely. 29th of August at the latest.
@LukasWallrich I am sorry this is taking so long (not that usual to be this slow, but it's both summer and pandemic). I emailed with @thelogicalgrammar, but I assume they may be on holiday.
If you have any ideas for who else to invite, I can try them too. But I think for now, we have to wait out August and try people again in September.
Dear @LukasWallrich (and @oliviaguest),
Thank you for the clear and thorough response to my review. Please find below my response to the revisions, only to those points where a response is relevant (to all other points, my response is some variant of "Great! Thanks for clarifying/adding/editing.").
1.a) Great! As far as I'm concerned a footnote is sufficient. I don't have any reason to believe simulations with populations of 9 agents would look different than with 10.
1.c) Excellent! Great to hear that you were able to get clarity on this. Where you wrote "However, this is simply due to the fact that Hong and Page’s standard deviations are those of the means, i.e. what are more commonly called standard errors, while I present standard deviations of the observed variables." , perhaps add that you know this from "personal communication" with the authors? Given that the caption of Table 1 in Hong & Page just says "standard deviations"?
2.a) I think what went wrong when I tried to build the Docker image is reflected in the following error message from the build log (I'm also attaching the entire build log file to this message): build_log.txt
denied: Token exchange failed for project 'rescience-review-diversity-abm'. Caller does not have permission 'storage.buckets.create'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control ERROR: push attempt 1 detected failure, retrying: step exited with non-zero status: 1
Perhaps this had something to do with the fact that I copied the relevant scripts over into a github repo of my own (the one called "rescience-review-diversity-abm": https://github.com/marieke-woensdregt/review_diversity_abm ). I don't fully remember now, but I believe I did that because I wanted to make some changes inside the scripts (e.g. change the email address in config.conf to my own, so you wouldn't get update emails saying my simulations had finished running ;) ) But once I got that error message when trying to build the Docker image, I simply decided to switch to running the simulations on my institute's cluster, rather than figuring out the permissions stuff.
2.c) Good solution! But I happened to notice the output filenames now all start with the same "HPmodelresults" prefix, also in the Grim simulation scripts?
2.f) Yes, good point. Yes, I indeed had to upgrade the version of pandas in my virtual environment compared to the default one of my institute's cluster (which apparently is still at version 0.25.3 :'-) )
3.c) The docstrings look great now!
3.g) Great! Perhaps just consider writing nr_processes = 16
rather than just 16
in calls to the BatchRunnerMP() method in the simulation scripts, because the comment above now explains what the nr_processes
parameter is for, but then that parameter/argument name is not actually visible below.
4.b) It makes sense that you don't want to refer to the relay strategy as "communication". I think it's described clearly now in the paper.
4.c) Great, this is clear now!
Hi everyone,
My apologies for taking so long to write this. Since Marieke covered so much ground already and I feel the project is very close to being done, I mostly just want to add some comments about the code.
1. The actual replication of the research. The reviewer must evaluate the authors’ claims about a successful replication, applying the standards of the field.
The authors focus on two papers, Hong & Page (2004) and Grim et al (2019). The results of the two papers are successfully replicated, minus some variations from the original results that are explored by the authors of the replication.
2. Reproducibility of the replication. The reviewers must be able to run the proposed implementation on their computer, and obtain the same results as the authors with the limits of the state of the art of the field.
I could reproduce the results presented in the paper, with some minor changes to the code:
3. Clarity of the code and the instructions for running it. Uncommented or obfuscated code is as bad as no code at all.
res_random = res.agent_descriptives.apply(pd.Series).random.apply(pd.Series).rename(mapper = partial(renamer, prefix = "random_"), axis = "columns")
can be rewritten as:
res_random = (
res
.agent_descriptives
.apply(pd.Series)
.random.apply(pd.Series)
.rename(
mapper = partial(renamer, prefix = "random_"),
axis = "columns"
)
)
4. Clarity and completeness of the accompanying article, in which the authors should clearly state why they think they have replicated the paper (same figures, same graphics, same behavior, etc.) and explain any obstacles they have encountered during the replication work. The reviewers should also consider if the article is sufficiently self-contained.
The authors have done an excellent job at summarizing the previous work and presenting their replication, especially for such a short summary. Just a couple points:
@marieke-woensdregt, thank you so much for these further helpful comments. I have implemented all the suggested fixes (for completeness included below). The only aspect where I would appreciate your further thoughts is 2a - there, I am not sure if I have fixed it and if further work is needed at present. My tendency would be to leave pyscript2gce
as is, given that it is tangential to this replication - but pls see below.
1.c) Excellent! Great to hear that you were able to get clarity on this. Where you wrote "However, this is simply due to the fact that Hong and Page’s standard deviations are those of the means, i.e. what are more commonly called standard errors, while I present standard deviations of the observed variables." , perhaps add that you know this from "personal communication" with the authors? Given that the caption of Table 1 in Hong & Page just says "standard deviations"?
Good point. I added that into the footnote.
2.a) I think what went wrong when I tried to build the Docker image is reflected in the following error message from the build log (I'm also attaching the entire build log file to this message): build_log.txt
denied: Token exchange failed for project 'rescience-review-diversity-abm'. Caller does not have permission 'storage.buckets.create'. To configure permissions, follow instructions at: https://cloud.google.com/container-registry/docs/access-control ERROR: push attempt 1 detected failure, retrying: step exited with non-zero status: 1
Perhaps this had something to do with the fact that I copied the relevant scripts over into a github repo of my own (the one called "rescience-review-diversity-abm": https://github.com/marieke-woensdregt/review_diversity_abm ). I don't fully remember now, but I believe I did that because I wanted to make some changes inside the scripts (e.g. change the email address in config.conf to my own, so you wouldn't get update emails saying my simulations had finished running ;) ) But once I got that error message when trying to build the Docker image, I simply decided to switch to running the simulations on my institute's cluster, rather than figuring out the permissions stuff.
Thanks for sharing this. It seems that there is an issue with the specific permissions of the user - hard to troubleshoot remotely. I have added a command to activate the Cloud Storage API explicitly to the README which might help. @thelogicalgrammar, did you run this using the pyscript2gce helper? Given that this is entirely tangential to the replication, I am not sure if we need to resolve it? (Happy to troubleshoot further if someone ever opens an issue in pyscript2gce ...)
2.c) Good solution! But I happened to notice the output filenames now all start with the same "HPmodelresults" prefix, also in the Grim simulation scripts?
Fixed, good catch ... too much copy-pasting.
2.f) Yes, good point. Yes, I indeed had to upgrade the version of pandas in my virtual environment compared to the default one of my institute's cluster (which apparently is still at version 0.25.3 :'-) )
Perfect - then the minimum version in requirements.txt is appropriate.
3.g) Great! Perhaps just consider writing
nr_processes = 16
rather than just16
in calls to the BatchRunnerMP() method in the simulation scripts, because the comment above now explains what thenr_processes
parameter is for, but then that parameter/argument name is not actually visible below.
Done.
Dear @thelogicalgrammar,
Thank you so much for taking the time to review this and to provide such helpful feedback.
Firstly, just one point where I would appreciate clarification.
- HPmodel, line 77: PSAgent is undefined, replace with PS
I don't quite understand this point - the PSAgent
class is defined right above.
Apart from that, I implemented all your very helpful suggestions. Specifically:
- Strangely, mesa did not work when installed through conda, but did work when installed via pip. Maybe worth mentioning this somewhere.
I added a note regarding this to the README.
- in Hong_and_Page/run_simulation.py "import datetime" should be "from datetime import datetime", or alternatively "datetime.now()" should be "datetime.datetime.now()"
Thanks - changed.
3. Clarity of the code and the instructions for running it. Uncommented or obfuscated code is as bad as no code at all.
- I found the folder structures somewhat unclear and the number of files a bit overwhelming at first. It would be great if the README gave a slightly more detailed description of the individual files.
I have expanded on the description
- Many lines of code were very long which forced code folding in a way that gave a chaotic visual appearance to the code. Example:
- In some cases indentation was not consistent with 2 or 4 spaces (e.g. line 233 and 235 in HPmodel.py, lines 31 and 33 in Gmodel.py)
- Please split long list comprehensions into multiple lines
- Do not use keywords as argument names. E.g. PSAgent has 'id' as an argument.
Thank you! I'm very new to Python, so wasn't aware of how easy it is to add reasonable line breaks. Now I did that and also used black to format the code overall.
- "heuristic == None" --> "heuristic is None"
Changed.
- The documentation for some classes is incomplete. E.g., in HPProblem only methods 'max_search' and 'step' are reported.
Good point. I had only documented the important methods, but to follow good practice, I now changed some methods to internal and documented the others.
- Very minor: Why "import copy" rather than "from copy import copy"?
- Please move all imports at the top of the file!
Both changed.
- run_simulation.py in the top folder is completely uncommented
This file is only meant to be used to run simulations on Google Cloud Engine; I wanted to leave it there to show how that workflow can work. The actual simulation code for the article is in the subfolders. I now highlighted this in the README and added a comment at the top of run_simulation.py.
- The notebooks are generally well-structured, but line-by-line comments on the code are almost entirely missing.
I now added some code-comments and revised some headings / text so that it should be clear what is happening in each cell.
- Please move all functions in the notebooks to the top of the relevant sections and all imports to the first cell
- Please format the definition of "best_teams" in the notebook
Done.
@thelogicalgrammar @marieke-woensdregt can you give some indication to the author if you are happy with the edits, please? 😊
Dear @LukasWallrich,
Thanks for the detailed answer! I am happy with the edits, everything looks ready to me.
- HPmodel, line 77: PSAgent is undefined, replace with PS
I don't quite understand this point - the
PSAgent
class is defined right above.
PSAgent was referred to as "PS" later in the code, but this seems to have been solved already.
Best, Fausto
Dear @LukasWallrich,
Apologies for the late reply. Thank you for the final fixes. Those all look good! And regarding my point 2a: I agree with your view that the pyscript2gce helper is tangential to the replication. I think the fact that you added a command to activate the Cloud Storage API explicitly to the README should help. Unfortunately, the free trial of Google Cloud Engine that I used to first try and run this replication myself for my initial review has now run out, so I cannot check directly whether this addition would resolve the issue I had initially. But I agree that if there is still an issue here, it does not need to be resolved for the purposes of this Rescience C paper and replication. I agree that you can just leave it (and of course look into it if someone were to at some point open an issue with the pyscript2gce helper on github, as you say).
In sum, all my comments have been adequately addressed, and I consider this paper and replication ready for publication!
Best wishes, Marieke
Dear Theo, dear Marieke,
Thank you very much for your positive feedback - I very much appreciated your contributions to this paper.
@oliviaguest can you please let me know what's next?
I need to reread your article, and check it's all making sense, has no typos, and can you do the same, please... And then we're good to go! 🌈
@oliviaguest I have now re-read the article, caught some typos and added the meta-data in as far as I have it (including the code DOI). The PDF is now here ... I will also update the link at the top. Please let me know if you catch anything else - and how I can get the article DOI and other remaining metadata (issue, volume etc).
@oliviaguest Friendly nudge - can we possibly manage to get this done by the anniversary of my submission?
@oliviaguest Is there anything more to do ? If you need help with actual publication, let me know.
Thanks for your encouragement in ReScience/call-for-replication#6 - as always, this took longer than expected, but I now managed to complete the replications. I'm very much looking forward to hearing your thoughts.
Original article: - two articles that built on each other: Hong, L., & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences, 101(46), 16385–16389. https://doi.org/10.1073/pnas.0403723101 Grim, P., Singer, D. J., Bramson, A., Holman, B., McGeehan, S., & Berger, W. J. (2019). Diversity, Ability, and Expertise in Epistemic Communities. Philosophy of Science, 86(1), 98–123. https://doi.org/10.1086/701070
PDF URL: https://github.com/LukasWallrich/diversity_abm_replication-manuscript/raw/main/article.pdf Metadata URL: https://raw.githubusercontent.com/LukasWallrich/diversity_abm_replication-manuscript/main/metadata.yaml Code URL: https://github.com/LukasWallrich/diversity_abm_replication
Scientific domain: Social Psychology (could be called Cognitive Modelling?) Programming language: Python Suggested editor: @oliviaguest? (but this does not require specialist knowledge)