ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

[Re] A Detailed Data-Driven Network Model of Prefrontal Cortex Reproduces Key Features of In Vivo Activity #65

Closed marcelorempel closed 5 months ago

marcelorempel commented 2 years ago

Original article: Hass J, Hertäg L, Durstewitz D (2016) A Detailed Data-Driven Network Model of Prefrontal Cortex Reproduces Key Features of In Vivo Activity. PLoS Comput Biol 12(5): e1004930. https://doi.org/10.1371/journal.pcbi.1004930

PDF URL: https://github.com/Lab-SisNe/ReScience_Hass_Hertag_Durstewitz_2016/blob/master/article.pdf

Metadata URL: https://github.com/Lab-SisNe/ReScience_Hass_Hertag_Durstewitz_2016/blob/master/metadata.yaml

Code URL: https://github.com/Lab-SisNe/ReScience_Hass_Hertag_Durstewitz_2016/tree/master/codes

Scientific domain: Computational Neuroscience

Programming language: Python, Brian2

Suggested editor: @rougier

rougier commented 2 years ago

Hi @marcelorempel, thank you for your submission and sorry for the delay. We'll assign an editor soon.

rougier commented 2 years ago

@oliviaguest @benoit-girard @otizonaizit @gdetor Can one of you edit this submission? (Python, Neuroscience)

gdetor commented 2 years ago

@rougier I could handle that

rougier commented 2 years ago

Thanks, I'll assigned you?

gdetor commented 2 years ago

Hi @vitay and @heplesser Would you be able to review this submission?

vitay commented 2 years ago

Yes, it sounds interesting.

heplesser commented 2 years ago

Yes, I'd be happy to review.

gdetor commented 2 years ago

Thank you @vitay and @heplesser

gdetor commented 2 years ago

Hi @heplesser @vitay gentle reminder

vitay commented 2 years ago

Here is my review of the [Re]implementation of Haas et al. (2016) paper by Rempel and colleagues.

Overall, this is an excellent replication of the original study. All figures related to the computational model are reproduced (the influence of the numerical method is even studied), but not the ones related to the physiological recordings, as this data is not available. I would not call it a partial replication though, as it is not the authors' fault and the computational results suggest that the reimplementation would also reproduce those results if the data was available.

Installation

The README is very detailed and provides instructions for an easy installation of the dependencies as well as a description of what the different scripts do. Requirements for the OS are not provided, I am not fully sure whether GSL is available on Windows, but everything runs well under Linux and OSX.

Installation of the dependencies on OSX was straightforward with the latest packages from conda-forge. I got the following versions (October 2022):

Simulations

Running all the scripts was easy using the instructions of the README. Here are the total durations on a M1 Pro laptop:

I got a bit surprised by the script which lasted almost 2 hours, as it uses 18 trials, but otherwise, everything runs very fast.

I systematically got this warning in all simulations, but I guess it is not important:

WARNING: Neurons can not be distributed evenly across cell types. Number of generated neurons (1003) differs from specified number (1000).

I attach a version of the paper using the figures generated by my run of the scripts. There is no obvious difference with those of the submitted paper, apart from the inevitable variability due to random initialization. The replication is reproducible.

Code

The code is very well written and documented. Not only do the scripts allow to reproduce the figures, but they are parameterizable and allow to run custom experiments easily.

The main class CortexNetwork could have used a bit more docstrings and comments, but the name of the methods and their arguments is most of the time enough to guess what they are doing.

Paper

The paper is very well written, explains in detail the reproduced model, and demonstrates that the replication was successful. I would request however two minor modifications:

  1. The results section is a bit hard to follow, as there are many figures that seem identical at a first glance (RK4 vs. original Matlab vs. RK2) and are placed by Latex quite far away from the corresponding text. I think it would help to have more subsections to structure the results and know to which setup the figure belongs without having to read the captions.

  2. The discussion only sums up all aspects of the original work that were successfully reproduced. It would be nice to have a more reflective part on what did not go so well, what was hard to reproduce, what was missing from the original paper and/or code and had to be guessed, and so on. Perhaps a discussion on RK2/RK4 would also be interesting, as I am not sure what to conclude from the figures.

Minor comments

In the README, the scripts for Figures 13 to 17 have a typo in the name, e.g. Mainr_regular.py -> Main_regular.py.

article.pdf

marcelorempel commented 2 years ago

Hi @vitay... thank you very much for the review and the observations, we will work on these modifications.

gdetor commented 2 years ago

@heplesser gentle reminder

heplesser commented 2 years ago

@gdetor Sorry for the delay, I am working on the review now.

heplesser commented 2 years ago

General comments

I am impressed by the efforts of Rempel and colleagues in re-implementing and replicating the model and simulations by Hass et al. Since some details of model construction are not given in the original paper and the original source code is quite complex, a lot of intense code analysis must be behind this replication. For reasons I will provide below, I still believe that the replication needs some more work before it can be accepted for ReScience C. I sincerely hope that you will not find my criticism too harsh. From similar efforts of my own, I fully appreciate the amount of toil, sweat and tears behind your work.

A note on style: As you will notice throughout the text, I vary between the style of a classical review report addressed to the editor and mentioning the authors by name, and an issue or pull request discussion adressing the authors directly as "you". Given the nature of ReScience C, I could not quite decide on one style and wanted to spend my time on review content rather than revising the entire text to a consistent style.

Replication of the research

I find it difficult to judge how well the implementation by Rempel and colleagues actually replicates the findings by Hass et al The starting point is difficult since Hass et al. leave out quite a bit of information in their paper, e.g., how long spike train sections actually entered into the calculation of statistics shown, how data were smoothed (eg their Fig 3A), how many repetitions were used ("over a number of repetitions") and providing sparse axes labels (eg their Fig 5D) leaving the reader to guestimate which x-axis values were chosen in the experiments. I presume much of this information can be extracted from the code provided by Hass et al.

First of all, it would have been beneficial if Rempel et al had used the same figure styles as Hass et al, in particular used density plots instead of histograms. It would also be beneficial to plot the data from the replication (Figs 2, 3, 4) and the original (Figs 6, 7, 8) together for direct comparison, and indeed to also include the data for the gsl_rk2 algorithm (Figs 10, 11, 12). For several plots, Rempel et al use histograms, while Hass et al use smooth curves presumably obtained through kernel density estimation. Both approaches are problematic. Plotting the data additionally as empirical distributions functions might be most informative as it does not lead to any binning or smoothing errors.

Visually, the raster plots, e.g., in Fig 1 of Rempel et al and Fig 3B of Hass et al do not seem to agree. Activity in L2/3 of Fig 3B of Hass et al appears considerably sparser than that of Fig 1 og Rempel et al. This could clearly be due to fluctuations in activity, but it should at least be discussed.

A critical problem in my view is in Fig 15. First, Rempel et al show "relative spiking activity (%)" on the y-axis, while Hass et al show "# spikes", which raises the question of whether numerical values can be compared directly. To further complicate replicability, Hass et al plot here "# spikes", but only describe this as "[n]umber of spikes in response to the input", without specifying over which time interval this number was counted or what baseline activity was.

That said, the curve for L5 in Hass et al's Fig 5D shows a monontonic rise in "# spikes", while the corresponding Fig 15 of Rempel et al shows a drop of approximately 5 SEM from the leftmost to the second-leftmost data point. This raises serious doubts about the replication—but also about the replicability of this measure as I will discuss in the next section.

Furthermore, Figs 6–8 of Hass et al were not reproduced. I believe this would be useful to ascertain the quality of the replication as it tests the model implementation under different input or parameter settings and thus confirms that replication is achieved not just for a single setting.

Reproducibility of the replication

I could run all scripts "out of the box". The resulting figures differed in part markedly from the figures in the paper by Rempel et al. I believe that his is due to (a) lack of control of random seeding and (b) poor choices for plotting.

I consider control over random numbers essential in simulations. The sequence of random numbers used in a simulation must be fully controllable to allow a check for reproducibility. With the present code, I can only conclude that figures differ, but cannot tell whether that is because of different random sequences or other issues. I have confirmed that two different runs of Main.py result in different spike rasters, so lack of control over random numbers is definitely an issue. Until that is fixed, it is not possible to check whether results are reproducible.

Two noticeable differences were the following: First, the histograms I obtained for Fig 5a and 5b look very different from those in the manuscript by Rempel et al, because (a) there was one data point at 30mV and that (b) because of this outlier and the choice of fixing the number of historgram bins (instead of fixing the bin width), the distribution at lower values is very poorly resolved. Second, in the Fig 15 generated by my run, the leftmost data point for L5 shows a value of 60±3% compared to 70±3% for the corresponding graph in the manuscript and showing monotonic increase.

The difference in Fig 15 between the two runs is considerably larger than the SEM, with one run showing monotonic increase as found by Hass et al, and one a decrease from the leftmost to the second data point from the left. This raises a general concern about the replicability of results, as they seem prone to considerable flucutation. To address this issue, I would suggest to obtain more data for this data point from both the original code and the replicated code, e.g., 100 trials for each and then compare the responses. I find it plausible that the network might from trial to trial either show a response (let's say as a burst of 100 spikes) while in other cases it continues baseline firing (let's say 10 spikes during the same time). Then, with a small number of trials (18), the average number of spikes might vary considerably between different sets of trials and the SEM of the spike count might not be the most suitable measure of statistic significance. If my hypothesis is right (response or no response), it might be more suitable to test this as a Bernoulli experiment. Furthemore, looking closer at Figs 5A and 5C of Hass et al, it appears that changing the variability of parameters also changes the baseline firing activity, which raises the question of where looking at overall spike counts is the right measure of response. Should one not rather look at the change in firing activity observed? Looking closely at your Fig 14 (corresponding to their 5C), it seems as if there is a second wave of activity approx 10 ms after the inital response in your case, but not in the Hass et al figure. Maybe such second waves could also lead to significant fluctuations in responses.

Clarity of code and instructions

I could follow the instructions and everything worked "out of the box" with the most recent versions of the respective software available in my preferred source. I would have appreciated a "warning" that Main_regular_modified.py would require many hours to run while all other scripts take only minutes.

I still see a number of issues with code and instructions.

README files

The archive contains both a README.txt and and a README.md file with similar, but not identical content. This is confusing. I'd suggest to remove the README.txt file and keep only the Markdown version. If both files should be necessary, they should either have clearly different purposes and content and point to each other, or should be identical except for formatting.

The list of files given in README.md does not match the files in the archive.

Choice of Python version

I consider it a poor choice to propose Python 3.6.8 as the default Python version for this replication, given that all support for Python 3.6 ended in the December 2021 with the release of the final source-only security update 3.6.15. One should not in any way entice people to install outdated an potentially vulnerable software. Since support for Python 3.7 will end in the summer of 2023, I consider Python 3.8 the oldest acceptable version. But I would strongly suggest to use the newest software stack available today (Python 3.10-based), name that as the default version and regenerate all figures with it. Your code clearly works with it.

File naming and structure

Your code would be easier to work with if you structured your code into a Python package containing network and analyis scripts, and a separate directory of examples which contain the scripts creating the figures. I would also suggest that you rename these scripts so the names indicate which figures scripts reproduce, and drop the "main" from the script filenames. Also, I'd suggest to follow PEP8 naming conventions for files, i.e., all lower case names structured with underscores. A shell or Python script running all figure-generating scripts would also be helpful.

Script structure

I think it is useful to have one script (e.g. Main.py or maybe better script_template.py) that explains and shows all possible settings. But to repeat all comments and all code just creating empty parameter lists or dictionaries in all scripts makes the scripts very difficult to read and in particular difficult to see how different scripts differ from each other. I would strongly suggest to minimize scripts so that it becomes clear what is special about each script.

I have also been wondering if you have considered using Jupyter Notebooks.

Optimization

At the beginning of each simulation, the scripts print "Optimization terminated successfully". This is not explained in either the original paper or the current manuscript and seems confusing. A key claim by Hass et al is that after they fitted the simpAEIF model to experimental data, all parameters were fixed. So what is there to optimize at the beginning of each run? This needs at least to be explained somewhere.

Network and simulation code

Before commenting on the network and simulation code, I would like to raise the question of your goal with the code you have created. Do you only want to demonstrate that you can replicate the results by Hass et al and maybe use the code in your own research, or do you aim for widespread use of your model implementation by other scientists. In the latter case at least, your code would need considerable work and my comments are written with the latter goal in mind.

Your CortexNetwork.py file is about 4500 lines long, making it very difficult to work with. It contains code for simulation, analysis and plotting, and I would strongly suggest to separate this into different files, if not even into separate Python packages. Other scientists may want to re-use your model implementation, but apply their own analysis and visualisation approaches.

The docstring for class CortexNetwork provide documentation for data members of the class, with is not common for Python class docstrings, which should describe the purpose of a class. At the same time, the constructor docstring does not provide any information about the large number of constructor arguments. Given the seemingly complex structure of these arguments (e.g., SynTypPar seems to be a 2D-array), detailed documentation of the arguments is essential if anyone is to use this class. Indeed 2D parameter arrays seem problematic as one easily can mistake indices—why not use dictionaries or data classes?

Frequent use of if type(x) is a seems unpythonic. It should at least be if isinstance(x, a), but ideally one should find a better solution for flagging different variants of a model. One simple solution could be to set parameters to None if they have no numeric value and then test against None.

Your NetworkSetup.py file represents, I believe, an enormous amount of work and a huge achievement in decoding and re-implementing code by Hass et al. Unfortunately, it consists of almost 1000 lines of undocumented code (no docstrings, no comments), making it essentially unusable for anyone but you. I think you could do the scientific community a great service by documenting this code properly.

Right at the beginning of NetworkSetup.py, ParamSetup() returns a 32-element tuple of parameters. This is a desaster waiting to happen and entirely non-maintainable. Such large parameter sets should be returned as dictionary, data class or a similar named collection.

Finally, I like that you store simulation information in files, but why do you use your own file format instead of, e.g., YAML, which would allow easier processing of data with third-party tools.

Clarity and completeness of the article

Maybe the greatest shortcoming of the original paper by Hass et al is the very terse description of network construction and connectivity: While neuron and synaptic dyamics are described in great detail, network construction is not; see also Senk et al (2022) for a general discussion of this problem in computational neuroscience.

In the process of re-implementing the model in Python using Brian2, you must have analysed the network construction process in depth. It would be a tremendous service to the community if you could include the understanding of the network construction process that you gained in detail in you paper. Right now, the insights you gained appear to be hidden in your code, which is difficult to understand as mentioned above.

On p 2 of the manuscript, you mention that "the probability of connection increases linearly with the number of neighbors that both cells share". This statement is incomplete, given that within a single column neurons have no spatial location in this model. In the original paper, "neighborhood" is defined in terms of common input to cells. But if connection probability is based on the amount of common input, then networks either need to be built in a certain order (e.g. from retina via thalamus towards V1 and beyond) or some iterative scheme is required to obtain a self-consistent solution. Could you describe the approach used? I also find the statement that "reciprocity among these connections is set to 47%" as little informative as the corresponding sentence in the original paper (p 21).

Furthermore, you write that "[a]lthough discussed in the article and built on the original code, this neighborhood rule is not functional in the original model due to matrices mismatches." What does this mean? How were networks then actually built in the original model? How do you build them in your implementation? Do you see any difference in the resulting connectivity structure or in the dynamics when the neighborhood rule is applied or not?

On p 4, between Equations 2 and 3, you start to use D(V), but I cannot find its definition anywhere. I also find the paragraph above Eq 3 unclear: Do you mean to say that also the original code by Hass et al resets w if it is in the interval (w-D(v), w+D(v)), and that just the description in their paper is incorrect, or have you only fixed this problem in your code?

Below Eq 3, I do not quite understand what you mean by a "transient frequency" of 200 Hz. Could you explain this better? I'd also appreciate it if you gave the equation for how "V is driven exponentially back to V_r".

On p 7, could you provide the equation for LFP?

On p 9, you mention connection probabilities p_con. Which probabilities are these and how are they related to the connection probabilities given in your Tables 2 and 3?

On p 13 in the first line of regular text, it should be "ODEs", not "EDOs", or?

Overall, the work by Hass et al seems to pose a significant challenge for replication, given the lack of detail in the original paper and the resulting need to extract much information from the original code. In addition, the actual network dynamics seem to pose challenges of interpretation, see the discussion on you Fig 15 above. I believe that you paper would benefit from a more detailed discussion of these challenges and the way in which you adressed it.

gdetor commented 2 years ago

Thank you @heplesser and @vitay for your comments.

marcelorempel commented 2 years ago

@heplesser thank you for the feedback, we will work on improvements.

vitay commented 2 years ago

@heplesser I have a genuine question about the importance of seeds for modeling, in order to improve my reviewing. I fully agree that reproducibility with a fixed seed is essential for the underlying neuro-simulator, running the same simulation twice with the same seed should always lead to the exact same results. But for a model using a neuro-simulator, one should on the contrary show that the results are qualitatively the same regardless of the seed. A model that reproduces data only with a particular seed is a useless model to me. Hence the necessity of having multiple runs and findings metrics that report the desired effect as an average. That is not easy when the results are raster plots, so I only checked that the tendency of the results was visually the same, regardless of the exact spike timings. Or did I miss something?

heplesser commented 2 years ago

@vitay I agree with you. First, at a technical level, we need full control over random number sequences to be able to reproduce simulations runs. This is essential to detect "Heisenbugs" that may introduce variability from trial to trial due to, e.g., uninitialised variables or errors in parallel code. Then, at the scientific level we need to perform trials with different seeds to capture the variability in the model dynamics and find suitable measures of neuronal activity for comparisons. While some measures may be considered "must haves", such as firing rates, others will depend on the scientific question at hand. And even firing rates can be tricky: if a network can jump spontaneously between up and downstates and does this at long intervals, one may need either need to collect data over very long time or explicitly split between up- and downstates. The variation in the leftmost L5 datapoint in Fig 5D/15 may be related to a similar effect.

BTW, even perfect code and full control over random numbers does not necessarily guarantee identical results when code is run on different systems or with different underlying math libraries as shown by Wilfredo Blanco et al (2020), who found that single-bit differences in the result of different exp() implementations can lead to qualitative differences in network dynamics.

gdetor commented 1 year ago

Hi @marcelorempel any updates?

marcelorempel commented 1 year ago

Hi @gdetor, sorry for the delay. We have had considerable work on analyzing and reformating the codes. The so far revised codes are on our repository (branch 'Revision'), which we will strive to keep up to date. We will send updates as we make further progress.

gdetor commented 1 year ago

Hi @marcelorempel any updates?

gdetor commented 1 year ago

@marcelorempel gentle reminder

gdetor commented 11 months ago

Hi @marcelorempel any updates?

gdetor commented 9 months ago

@marcelorempel Any progress ?

gdetor commented 6 months ago

@marcelorempel Any updates? Thank you

rougier commented 6 months ago

@gdetor @marcelorempel has not responded in more than a year. Maybe we can close this submission and it'll be re-opened if we hear again from authors, what do you think?

gdetor commented 6 months ago

@rougier I agree