ReScience / submissions

ReScience C submissions
28 stars 7 forks source link

[Re] Spread of alpha-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis #54

Closed MathieuBo closed 3 years ago

MathieuBo commented 3 years ago

Original article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662627/

PDF URL: https://github.com/MathieuBo/Re-Henderson-2019/blob/master/article.pdf Metadata URL: https://github.com/MathieuBo/Re-Henderson-2019/blob/master/metadata.yaml Code URL: https://github.com/MathieuBo/PathoSpreading

Scientific domain: Neuroscience, Neurodegeneration Programming language: Python Suggested editor: @rougier (until we find someone else?)

Dear all, This is a successful replication of the Henderson et al. 2019 paper related to the spread of alpha-synuclein through the brain connectome. The code is entirely in Python3. I checked the editor/reviewer list but I am not sure who is an expert on neurodegeneration. However, the model is mostly based on a graph theory.

All the best, Mathieu Bourdenx and Thomas Paul

rougier commented 3 years ago

Thanks for your submission and sorry for the delay. We'll assign an editor soon. I cannot edit it due to conflict of interest. @benoit-girard @oliviaguest @otizonaizit @ctb @eroesch Can any of you edit this submission in Neuroscience/Python ?

oliviaguest commented 3 years ago

This is so far away from my area(s) of expertise, sorry. ☺️

ctb commented 3 years ago

It's pretty far away from mine, as well, I'm afraid.

rougier commented 3 years ago

@eroesch @otizonaizit @benoit-girard @khinsen Could you edit this submission? I think it's pretty far from everybody exeprtise but maybe @MathieuBo can suggest reviewer names.

khinsen commented 3 years ago

@rougier Finding reviewers is indeed the big issue. This is so far away from my own field that I don't know who would be competent to review this.

MathieuBo commented 3 years ago

@rougier, I have been looking for names of people adopting similar approaches:

rougier commented 3 years ago

Dear @ReScience/editors we need your help to edit this submission. Author has already proposed several potential reviewers. I think the domain is far from any of us and I cannot edit it because of conflict of interest. Can anyone edit it (link: https://github.com/ReScience/submissions/issues/54)

rougier commented 3 years ago

@ReScience/editors (second call): we need your help to edit this submission. Author has already proposed several potential reviewers. I think the domain is far from any of us and I cannot edit it because of conflict of interest. Can anyone edit it (link: https://github.com/ReScience/submissions/issues/54)

khinsen commented 3 years ago

OK, I'll handle this, starting Monday if all goes well.

khinsen commented 3 years ago

@bmisic @Raj-Lab-UCSF You have been suggested as reviewers for a submission to the journal "ReScience C". Would be interested in doing this review? Please read on for the details.

ReScience C is a journal dedicated to replications in computational science. The submission I would like to invite you to review is a replication of a 2020 publication by Mike Henderson et al. entitled "α-Synuclein pathology spread through the brain connectome is modulated by selective vulnerability and predicted by network analysis". The goal of a replication article is to (1) explore if and to what degree published results can be replicated using different computational tools and (2) provide a reproducible implementation of the applied methods using Open Source software, for easier future validation and reuse. The reviewers' job is to verify those two aspects of the submission. This is much more technical than reviewing for traditional scientific journals.

Note also that ReScience C practices an extreme form of open reviewing: the whole submission and reviewing process happens in public, such that anyone can see it and even jump in. In fact, this message I am addressing to you is part of this open reviewing process.

khinsen commented 3 years ago

@MathieuBo In the meantime, a question based on my own reading of your article. You state that the original work already comes with code and data. It isn't clear from your description if you used this code or not, and if you did, to what degree: as a source of inspiration for your own code, or only as a black-box result generator for testing your own implementation. Could you explain?

MathieuBo commented 3 years ago

Thanks for looking into our submission @khinsen! To answer your question, it was a bit of both. We mostly used it as an inspiration for our code (for example, we kept some function names similar to ease comparison) but the precise implementation diverges in some ways (mostly related to Python vs R). We also used it to generate results and compare them with the results of our code. Is that clearer?

khinsen commented 3 years ago

Thanks @MathieuBo! You should add a paragraph on this to your, so that readers know what exactly was the subject of the replication. There's nothing wrong in taking inspiration from existing code, but it means that if there's a mistake in that code, it could propagate into yours, and readers should be aware of that possibility.

MathieuBo commented 3 years ago

Sounds good. I will add a paragraph in the manuscript. Thanks @khinsen.

khinsen commented 3 years ago

Status update: @ejcorn, one of the authors of the original article, has agreed to review this submission, starting June 22nd.

illdopejake commented 3 years ago

Hi all -- checking in as a Reviewer. I'll need to carefully re-read the original paper. As per instructions, I'll give some feedback about the replication manuscript with respect to whether claims of replication have been substantiated. After, I'll run the code provided to ensure I can reproduce everything, and I'll check in during that process if any issues arise. Will try to have this done in 2 weeks, if that's alright.

If I missed anything or not understanding, please do correct me / guide me.

khinsen commented 3 years ago

That sounds very good @illdopejake - and thanks for accepting this mission!

illdopejake commented 3 years ago

Dear all,

Apologies for the delay. I've read through the original Henderson et al. manuscript and the Paul & Bourdenx ReScienceC manuscript. I also took a try of reproducing the code locally. I have some feedback as it pertains to this initial experience with the manuscript and code.

Regarding the manuscript: I found the manuscript to be well written, conveying in appropriate detail all information necessary to understand the replication effort without the need for re-reading the original manuscript. The methods and results were also well contextualized, and all key figures, tables and points necessary to convey the details of the replication were present. However, I did want to bring up these two minor points:

==========

First foray into the code I perused the github repo, cloned it, set up a virtual Python 3 env with only the necessary libraries, and ran through Pipeline.py. I uncommented functions in lines 365 till the end in order to see if all functions worked. Most of the code worked beautifully, but I had some comments and I did eventually hit some errors/discrepancy during my attempt at reproduction. I will list comments, suggestions and questions here, followed by reproduction errors:

Comments and suggestions

Reproduction failures

Conclusion I would be happy to work with the authors to figure out the source of this error. I would ideally like to change some of the input arguments to check the robustness of the pipeline, but will wait till the above error is resolved. Please let me know if there are other aspects of the reproduction that the authors or editors feel require review.

--Jake

khinsen commented 3 years ago

Thanks @illdopejake for this very detailed review!

@MathieuBo While we are waiting for the second review by @ejcorn, coud you look at the technical issues encountered by @illdopejake?

MathieuBo commented 3 years ago

Thanks @illdopejake for taking the time and the constructive comments! I will work on these issues as soon as possible.

MathieuBo commented 3 years ago

Authors Response (AR): Dear @illdopejake, Thank you for reviewing our reproducibility report! Please find below answers to your comments.

Dear all, Apologies for the delay. I've read through the original Henderson et al. manuscript and the Paul & Bourdenx ReScienceC manuscript. I also took a try of reproducing the code locally. I have some feedback as it pertains to this initial experience with the manuscript and code. Regarding the manuscript: I found the manuscript to be well written, conveying in appropriate detail all information necessary to understand the replication effort without the need for re-reading the original manuscript. The methods and results were also well contextualized, and all key figures, tables and points necessary to convey the details of the replication were present. However, I did want to bring up these two minor points: • The mouse brain plots in Figure 2D appear not to be identical to those in the original Henderson et al. papers. This is hard to understand, since the statistical tests on that same data perfectly replicate the results described by Henderson et al. Whether the difference is actually present or simply illusory, I could imagine a few possibilities that may contribute to this discrepancy: 1) a) The color scale has a different range. It is -1 to 1 in Henderson et al., and -2 to 1.5 in the present work. 2) b) The rendering software differs, and the slices appear not to be perfectly harmonized with the figure in Henderson et al.

AR: Thanks a lot for raising that issue. In Figure 2D, we plotted the residuals obtained for every brain region after averaging the residuals from the three different timepoint (1, 3 and 6 months post-injection – MPI). As you mentioned, our code perfectly replicated the original publication. For confirmation, we here plotted the residuals obtained using our code and using the original code.

Residual plot

The difference in rendering can be explained by various reasons.

In the revised version of the manuscript, we have now added a sentence mentioning the differences due to the set of annotations. Also, if you think it would be useful, we could include the plot shown above in the replication.

Related to a comment below, I am curious whether there were measures that were necessary during the present replication that were needed to ensure the replication was perfect, that were not mentioned in the original Henderson et al. manuscript. In other words, did the authors need to "figure out" any extra steps during data preprocessing or statistical analysis in order to obtain results identical to Henderson et al. If so, it would be of interest to describe in the manuscript any extra steps that were taken.

AR: Except the rendering of brain slices for which we decided to use BrainRender (in order to automatize the rendering of such panels in the future), the replication required no extra steps. The available code was clear and could be followed easily.

First foray into the code I perused the github repo, cloned it, set up a virtual Python 3 env with only the necessary libraries, and ran through Pipeline.py. I uncommented functions in lines 365 till the end in order to see if all functions worked. Most of the code worked beautifully, but I had some comments and I did eventually hit some errors/discrepancy during my attempt at reproduction. I will list comments, suggestions and questions here, followed by reproduction errors: Comments and suggestions

  • The github repo has a number of jupyter notebooks that are not mentioned in the documentation. One of these involves generation of Figures for the present manuscript, whereas others involve data analysis both reported and not reported. Were these notebooks used in the process of replication, and is there anything within them that warrants mention? Are they simply supplemental analyses to demonstrate use cases of the code? Will they be removed or retained in the future? It would be useful if a guide to or explanation of these notebooks was included or at least mentioned in the documentation.

AR: As this replication is part of a bigger study, we mistakenly left some unnecessary Jupyter Notebooks not directly related to the replication. In order to avoid concerns, we chose to clean the main branch of the repository to leave only the required files. We have also edited the documentation.

  • At first, I considered playing around with the code in Pipeline.py interactively, so I started loading some of the information into a jupyter notebook. However, I was unable to proceed as it appears that timepoints is used as a global variable within the function dm.find_best_c_and_r() (invoked in line 358 of Pipeline.py). I'm not sure if this is intentional and there may be other instances. The authors may not be concerned with this as it does not impede replication or execution of Pipeline.py as a script, but I figured it might be worth mentioning.

AR: Thank you for warning us about this. We have now corrected the code.

  • At some point, I was guided by an error that I needed to change WindowsError to OSError in line 83 of summative_model.py (perhaps this is necessary elsewhere?). This may warrant mention in the documentation, though the error message was helpful.

AR: We apologize for this issue related to the fact that the repository contained unnecessary files. We have now remove this file.

  • The output folders for Pipeline.py appear to be hardcoded into a directory one above (../). This is not where the output livesin the git repo. It is not necessary to give an argument for directory, but I should note that when I tried changing the locations of output directories (lines 22-28 in Pipeline.py), this caused an error later in the script.

AR: We fixed the output folders in the updated version of the code. The output path can now be chosen in the data manager. We consequently corrected the ReadMe section.

Reproduction failures

  • When performing the task “Plotting the iCPu Fit versus Fits of random seed regions…” , the result was “iCPu seed is the 56.0th percentile”. Perhaps I'm misunderstanding what this means, but it seems to be at odds with the results in the paper, and does not seem consistent with the plots that were generated (which look like the plots in manuscript).

AR: Very good point. We fixed it as returning the percentile of the true seeded region is only relevant when assessing the specificity of the model.

  • Similar experience while “Plotting the adjacency matrix Fit versus Fits of random adjacency matrices…”, the result was “iCPu seed is the 58.0th percentile”.

AR: Fixed in the updated version. Here again, we removed that print as it was not relevant for this function (it came from a non-corrected copy/paste).

  • Plotting the non-shuffled pathology Fit versus shuffled pathology fits, I ran into the following error:
 ....Paul_ReScienceC_paper/revC/lib/python3.7/site-packages/scipy/stats/stats.py:4040: RuntimeWarning: invalid value encountered in subtract | 0/3 [00:00<?, ?it/s] ym = y.astype(dtype) - ymean 0%| | 0/3 [00:00<?, ?it/s] 5%|██████████▊ | 3/58 [00:00<00:00, 68.01it/s] Traceback (most recent call last): File "Pipeline.py", line 389, in dm.compute_stability(Sliding_Window=None) File "Pipeline.py", line 291, in compute_stability suffix=suffix, seed=seed) File "/Users/jacobv/Documents/Papers/Reviews/Paul_ReScienceC_paper/replication/Robustness_Stability.py", line 235, in stability roi_names=ROInames) File "/Users/jacobv/Documents/Papers/Reviews/Paul_ReScienceC_paper/replication/fitfunctions.py", line 78, in cfit r, = stats.pearsonr(exp_val, predict_val) File "/Users/jacobv/Documents/Papers/Reviews/Paul_ReScienceC_paper/revC/lib/python3.7/site-packages/scipy/stats/stats.py", line 4046, in pearsonr normym = linalg.norm(ym) File "/Users/jacobv/Documents/Papers/Reviews/Paul_ReScienceC_paper/revC/lib/python3.7/site-packages/scipy/linalg/misc.py", line 145, in norm a = np.asarray_chkfinite(a) File "/Users/jacobv/Documents/Papers/Reviews/Paul_ReScienceC_paper/revC/lib/python3.7/site-packages/numpy/lib/function_base.py", line 489, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs

AR: Here again, we sincerely apologize as this function came from a test that was not completely mature. We have now removed this function from the repository. We could still explain why you obtained this error. The point of this function was to determine what is the minimal number of brain regions that should be studied to have stable predictions using the NDM model. To do so, it select random slices of the connectivity matrix. Unfortunately, this sometimes led to selection of regions with no pathology and later in the code to a “ValueError: array must not contain infs or NaNs” error. This could be solved by using a while True-try loop. Still, we decided to remove this part of the code as there is no direct application in the replication.

Conclusion

I would be happy to work with the authors to figure out the source of this error. I would ideally like to change some of the input arguments to check the robustness of the pipeline, but will wait till the above error is resolved. Please let me know if there are other aspects of the reproduction that the authors or editors feel require review. --Jake

AR: Thanks again for your comments and pointing out the various issues. Please let us know if you would like us to modify other aspects of the code.

Best,

Paul & Mathieu

khinsen commented 3 years ago

Thanks @MathieuBo for your detailed reply. @illdopejake, are you happy with this?

Also pinging @ejcorn for the second review.

illdopejake commented 3 years ago

Hi all, sorry for the delay. Yes I've reviewed all the changes. The authors have done a great job with this and I have nothing more on my end. --Jake

ejcorn commented 3 years ago

Hi all, sorry for my delay on this. I have a few small comments but the manuscript and code look great. Certainly an improvement upon our code repository and happy to see a successful replication!

Manuscript

Code After the authors incorporated @illdopejake's comments, I was able to run pipeline.py with no errors using python 3.7.6 installed with anaconda. I didn't create a new environment. This code runs much better than ours so I'm happy to see it out there! A couple of small recommendations:

khinsen commented 3 years ago

Thanks @ejcorn for your review! @MathieuBo, any comments from your side?

MathieuBo commented 3 years ago

Sorry for the delay to answer (August in France ...). Thanks a lot @ejcorn for reviewing our manuscript and code! I have now edited the description of the connectivity matrix to correct for the inversion between incoming and outgoing (thanks for catching that!). I already pushed the new version of the manuscript on the repository.

Regarding the code execution, I am not able to reproduce the pause that you mention. Could you specify when that happened?

@illdopejake, thanks again for your comments. Glad the correction solved the issues.

ejcorn commented 3 years ago

Probably just something to do with my package versions or GUI set up. I didn't run it in a clean environment. No need to chase after this. If you're curious, here's the output:

Adjacency matrix: successful concatenation
Data Manager initialized

Name of the folder containing the output graphs is:   _iCPu
Graph computed - Laplacian matrix created

---------------------------------------------------
--------------NETWORK DIFFUSION MODEL--------------
---------------------------------------------------

---------------------------------------------------
Month Post Injection 1
Number of Regions used:  97
Pearson correlation coefficient 0.5588719364388353
Pvalue (non corrected) 2.695125677527694e-09
---------------------------------------------------

---------------------------------------------------
Month Post Injection 3
Number of Regions used:  113
Pearson correlation coefficient 0.6960522094835953
Pvalue (non corrected) 1.1504825644074288e-17
---------------------------------------------------

---------------------------------------------------
Month Post Injection 6
Number of Regions used:  113
Pearson correlation coefficient 0.6478470914996128
Pvalue (non corrected) 8.793043099888211e-15
---------------------------------------------------

iCPu was dropped to create the heatmap.

At that point, 10 figures pop up that I need to close before it will proceed to:

Loading of Random Seeding test:
  8%|███▎                                       | 9/116 [00:06<01:15,  1.42it/s]
MathieuBo commented 3 years ago

Thanks @ejcorn. I look into that.

MathieuBo commented 3 years ago

@ejcorn , I found the issue. Depending on how you execute the code (within an IDE or through terminal), the script was pausing after plotting awaiting for the user to close the figures before proceeding. In order to prevent that issue, I have created a new argument display_plots in the DataManager class. By default it is set to False and the plots are saved to the output folder. I hope that is an acceptable solution. I here prefered to not use additional librairies.

Thanks!

ejcorn commented 3 years ago

Sounds good! Thanks for addressing.

On Aug 31, 2021, at 9:08 AM, Mathieu Bourdenx @.***> wrote:



@ejcornhttps://github.com/ejcorn , I found the issue. Depending on how you execute the code (within an IDE or through terminal), the script was pausing after plotting awaiting for the user to close the figures before proceeding. In order to prevent that issue, I have created a new argument display_plots in the DataManager class. By default it is set to False and the plots are saved to the output folder. I hope that is an acceptable solution. I here prefered to not use additional librairies.

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ReScience/submissions/issues/54#issuecomment-909217993, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI7VUWVU4W2SUSNXUBA2SMDT7THRNANCNFSM43BG7DCQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

khinsen commented 3 years ago

Thanks @MathieuBo and @ejcorn for looking into what seems to me like a matplotlib backend configuration issue.

Question for @ejcorn: are you happy with the article and code as it is now? In other words, can we accept it for publication?

ejcorn commented 3 years ago

Yes, absolutely! I’m more than happy with the article and code. I recommend for publication.

From: Konrad Hinsen @.> Reply-To: ReScience/submissions @.> Date: Wednesday, September 1, 2021 at 5:35 AM To: ReScience/submissions @.> Cc: "Cornblath, Eli" @.>, Mention @.***> Subject: [External] Re: [ReScience/submissions] [Re] Spread of alpha-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis (#54)

Thanks @MathieuBohttps://github.com/MathieuBo and @ejcornhttps://github.com/ejcorn for looking into what seems to me like a matplotlib backend configuration issue.

Question for @ejcornhttps://github.com/ejcorn: are you happy with the article and code as it is now? In other words, can we accept it for publication?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ReScience/submissions/issues/54#issuecomment-910111007, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI7VUWWB2K3464HVRWVLE23T7XXT7ANCNFSM43BG7DCQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

khinsen commented 3 years ago

Thanks @ejcorn for confirming! And thanks to both @ejcorn and @illdopejake for their very constructive reviews. The paper is now accepted!

MathieuBo commented 3 years ago

Excellent news! Thanks again everyone for the help and constructive comments!

khinsen commented 3 years ago

@MathieuBo I need your help for the publication process because I get LaTeX errors when trying to compile your article. Could you please accept https://github.com/MathieuBo/Re-Henderson-2019/pull/1 and compile the final version of the article? Feel free to check my changes to the metadata of course (except for the DOI, which isn't active yet so you cannot check it).

MathieuBo commented 3 years ago

@khinsen, I have just merged your PR, recompiled the final version of the article and pushed it on the repo.

khinsen commented 3 years ago

Thanks @MathieuBo for the quick update. The paper is now published: https://zenodo.org/record/5379631/files/article.pdf