Closed bruAristimunha closed 2 years ago
Thanks you for your submission and sorry for the delay. We'll assign an editor soon.
@koustuvsinha Could you serve as editor for this submission ?
@rougier any update?
@bruAristimunha Sorry for the delay. Pinging @koustuvsinha
Hi @rougier, Sorry to bother you, is it possible to switch to another possible editor? It seems to me that he cannot take over.
@bruAristimunha Sorry for the delay in my answer. Yes, it seems @koustuvsinha is not responsive. I'll try by mail.
@gdetor Could you edit this submission ?
@rougier Sure, I can edit this submission.
Hi @neuronalX could you review this submission? Thank you.
Hi @ogrisel Could you review this submission? Thank you.
Hi @gdetor I am not an expert in CNNs and given the number of experiments, the paper would need some expert to review the codes properly.
May be you could ask to @vthierry or Thalita Firmo Drumond.
Hi @neuronalX Thank you for your response. No worries.
@benureau Could you please review this submission?
@benureau Gentle reminder
@ogrisel Gentle reminder
@benureau Gentle reminder
@gdetor: Sorry, I missed the previous messages. I will look at the submission in the next 24 hours and tell you if I feel qualified to review it.
@benureau No worries, thank you for the update
@gdetor: I will review this submission in the next 30 days.
@benureau Thank you
@damiendr Could you please review this submission?
Sorry for the delayed response, I missed this thread completely. Seems there is still a spot for reviewers open, @gdetor if you like I can review this submission.
@koustuvsinha Sure
Here is my review. The reproducibility effort is very comprehensive and thorough, but the report is difficult to read and needs more work and polish to be in a shape to present in the journal. I'm happy to reply and re-evaluate the updated draft.
I have not gone through the code in detail yet. I would wait for the changes, atleast a better logical presentation of the results before verifying the claims in the code.
@koustuvsinha Thank you for the review. @bruAristimunha Can you address reviewer's comments? @benureau Gentle reminder
@gdetor: sorry for the delay, going through the review now, assembling the hardware I need to run the code.
@bruAristimunha: did you contact the original authors about what seemed ambiguous in the original paper? Specifically, for instance, section 5.1? And all the assumptions you made in section 3.1? While a paper should be replicable without reaching out to the original authors, there's a point to be made about ferreting out the missing details and making them explicit in a replication effort submitted to ReScience.
@gdetor, I'm just waiting the other authors to answer @koustuvsinha's review.
@benureau Yes, but my e-mails were not answered.
I apologize for the delay in responding, @gdetor, @koustuvsinha and @benureau. All the reviewers' comments were accepted. The work will go through re-editing with the removal of several sections that are not related to reproduction itself.
We will try to improve the fluidity of the text, explaining that we did not have a response from the authors and that we did not obtain the same result, in terms of variance and results.
This review round will be conducted by @DiogoEduardo. In particular, we plan to remove sections: 2 (Related Work), 3.2 (AutoEncoders), 4.3 (metrics definition) and 5.3 (Extension). In addition to improved fluidity as a whole.
@bruAristimunha: I am halfway through my review. I will wait for you to update your submission, since the changes you are preparing seem important.
A few comments:
Code:
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
tensorflow 2.1.0 requires scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.2 which is incompatible.
- The third cell of 7.4.5 in the notebook complains about pyarrow dependency missing.
- The last cell of 3. throws errors with True on all three first lines:
```bash
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_accumulated".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_file".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_person".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/reduced".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon/reduced".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/save_model".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon/save_model".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit".format(base_fold)'
mkdir: cannot create directory ‘../article/figure’: File exists
mkdir: cannot create directory ‘../article/table’: File exists
download_chbmit
to download from scratch.
Downloading the folder information: ../data/chbmit/
Downloading the folder files: ../data/chbmit/
Downloading the folder information: ../data/chbmit/chb01/
Downloading the folder files: ../data/chbmit/chb01/
Downloading the folder information: ../data/chbmit/chb02/
Downloading the folder files: ../data/chbmit/chb02/
HTTPError Traceback (most recent call last) [...]
- Currently re-running training. Will post my results when it finishes.
After running into some trouble with the rm -r
commands above not working and having to do it manually, I was unable to reproduce Figure 2:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-8de2b88f16b3> in <module>
1 variance_per_person = get_variance_by_person(PATH_CHBMIT,
----> 2 range_= (1,11))
~/rescience/ReScience-submission/code/variance.py in get_variance_by_person(path_dataset, range_)
318 var_pearson.append(accumulate_var)
319
--> 320 variance_df = DataFrame(var_pearson).drop("time", 1)
321
322 variance = DataFrame(
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
472 if is_named_tuple(data[0]) and columns is None:
473 columns = data[0]._fields
--> 474 arrays, columns = to_arrays(data, columns, dtype=dtype)
475 columns = ensure_index(columns)
476
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/internals/construction.py in to_arrays(data, columns, coerce_float, dtype)
466 elif isinstance(data[0], ABCSeries):
467 return _list_of_series_to_arrays(
--> 468 data, columns, coerce_float=coerce_float, dtype=dtype
469 )
470 elif isinstance(data[0], Categorical):
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/internals/construction.py in _list_of_series_to_arrays(data, columns, coerce_float, dtype)
514 index = getattr(s, "index", None)
515 if index is None:
--> 516 index = ibase.default_index(len(s))
517
518 if id(index) in indexer_cache:
TypeError: object of type 'int' has no len()
and figure 1 and 2 were different than the one in the paper, with for instance much less channels on Figure 1. I stopped there. Could you make sure that all the figures are reproducible from scratch, on a vanilla machine/vanilla conda environment? Or explain the sources of possible differences?
Obtained figure 1:
Obtained figure 3:
@bruAristimunha Gentle reminder
Hi @bruAristimunha Any progress on this?
Hello @gdetor, sorry for the delay. I am waiting for author's review, I believe that in a week everything will be concluded.
@gdetor @bruAristimunha Gentle reminder
Hi @bruAristimunha, Is there any progress on this matter?
Hello @gdetor and @rougier
I apologize for the delay in contacting you. Unfortunately, for personal reasons, I won't be able to handle the modifications. However, the second author @DiogoEduardo will continue with the necessary revisions.
@bruAristimunha Ok, no problem. @DiogoEduardo Can you address the reviews?
Hi @DiogoEduardo gentle reminder
Hi @gdetor. Sorry about the delay. We completed the changes in the text and adapted the jupyter notebook to work on Linux/Windows. I have tested in both systems using vanilla machine / vanilla conda environment as stated in the readme so I think this type of error is solved to reproduce our results. The text changes stated before by @bruAristimunha are also completed.
@DiogoEduardo Thank you for the update @koustuvsinha @benureau Could you please review the modified article?
@koustuvsinha @benureau Gentle reminder
@gdetor: of course, will do the review soon.
@koustuvsinha @benureau Gentle reminder
@gdetor Any update?
@gdetor, @rougier: I am finishing the review right now. Hopefully will be done by the end of the day.
@koustuvsinha Gentle reminder
Sorry about the delay, thanks for reminding! The paper looks much polished now, thanks for incorporating the suggested changes. Looks good to me now, it would be perhaps better to consolidate the key findings of Section 4 in a new section "Summary of Reproducibility" before/after the introduction to help readers better gain a quick insight of the results.
Sorry, still verifying some part of the code. Will be done soon.
Deep apologies, I am re-running training one last time and then submitting my review.
I am sorry for the delay. Part of it was that I spent a lot of time trying to reproduce the results by re-running training without managing, and thought I was doing something wrong.
In the review below, I detail several problems, big and small, I have with the replication effort right now. However, I have a positive opinion of the work that has been done in this replication under difficult circumstances (ambiguous original article, no possible contact with the original authors, an obvious problem in the original article (copy-pasted error with Figure 3/4)).
Taken from the review guidelines.
The authors have done a serious job trying to reproduce the results of the original article. Much of their results fail to reproduce quantitatively.
One problem seems to be that the original article and the authors do not use the same testing dataset for dataset 1. The original article says (p. 25406): "In addition, because the ratio of positive and negative samples in the testing set is 2:1, the accuracy is 0.667 when all samples were deemed as positive and the accuracy is 0.333 when all samples were deemed as negative." Yet, in the replication, the ratio seems to be 3:2 (i.e., if I am not mistaken, the whole dataset?), generating scores of 0.6 and 0.4 when all samples are deemed positive or negative respectively (e.g., Table 5 here). Is that something the authors considered? Could the original article only use C, D and E subsets of dataset 1 for testing?
I have tried to rerun the code as much as possible. The replication provides different flags: download_again
, train_again
and eval_again
. The notebook with the code comes precomputed. The results of the precomputed notebook are the same as the article, and we can reproduce the same results by re-running the notebook and the notebook with the flag eval_again
turned on (True
). download_again
is problematic as the Bonn University EEG database
seems to have moved or disappeared (any chance to put it in a more stable place on zenodo or figshare?). Downloading the Children’s Hospital of Boston EEG database is slow (big files), and any issue during the download means restarting the download from scratch with the provided code. Not ideal, but works eventually.
I also ran the notebook with train_again
set to True. I did that on a fully updated Ubuntu 20.04 (as of 2021/11/15), on a NVIDIA RTX 2080ti.
It takes approximately 6 hours. The variance code does not run properly anymore (KeyError: "['time'] not found in axis"
see below).
I have trouble reproducing the classification results. I sometimes obtain similar results as the ones of the article, but not always. The main problem, though, seems to be that little training is done. This is Figure 9 (notebook numbering):
Am I doing anything wrong? So, right now, the code fails to train for me, it seems. I am attaching the notebook I trained: rerun_train.ipynb.zip
The code is clear enough and the instructions for running it are good.
I have a problem with this point. The authors do talk quite in detail about the issue with the reproducibility of the variance analysis, and try (in code) several theories they had for the difference. This is well done.
However, considering the classification results, the authors plot and describe the quantitative differences between their replication and the original article, but do not provide much if any explanation, or even possible theories for them (e.g., "change in scipy classifiers implementation"). An instance of this is, "Although the results in original paper also have few variations for the classifiers svm_linear, svm_radial and multi_layer we had no variations in these classifers for the autoencoder AE‐CDNN‐MAAE." (btw classifers -> classifiers). This is troubling because in Table 9, svm_linear
and svm_radial
fare quite poorly compared to the other classifiers in the replication, with the result of the original paper's Table 9 placing SVM classifier quite high in performance (93% vs 60% in the replication). What is going on there? The features seems to be learned for some classifiers but not others.
It's also not clear if there are qualitative differences between the replication and the original article. Does the replication support the same conclusions of the original article? Some of the text seems to approach this conclusion ("In Figure 9 we observe similar average accuracy between AE‐CDNN‐MAE, AE‐CDNN‐MAAE, PCA and SRP for both datasets. The same is observed in the original article, as well as a similar behavior"), but a more explicit phrasing would be preferable.
As mentioned before, the apparent 3:2 ratio of the testing dataset for dataset 1, where the original article uses 2:1.
When eval_train
is set to True in the notebook, I obtain in section 8.1, first code cell:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_3745/1033232197.py in <module>
----> 1 variance_by_file = get_variance_by_file(PATH_CHBMIT)
~/projects/Rescience2021/rerun_train/code/variance.py in get_variance_by_file(pathdataset, range) 211 rank_variance.append(rank_by_file) 212 --> 213 variance_df = DataFrame(rank_variance).drop("time", 1) 214 215 variance = DataFrame(
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/frame.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3995 level=level, 3996 inplace=inplace, -> 3997 errors=errors, 3998 ) 3999
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3934 for axis, labels in axes.items(): 3935 if labels is not None: -> 3936 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 3937 3938 if inplace:
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors) 3968 new_axis = axis.drop(labels, level=level, errors=errors) 3969 else: -> 3970 new_axis = axis.drop(labels, errors=errors) 3971 result = self.reindex(**{axis_name: new_axis}) 3972
~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors) 5015 if mask.any(): 5016 if errors != "ignore": -> 5017 raise KeyError(f"{labels[mask]} not found in axis") 5018 indexer = indexer[~mask] 5019 return self.delete(indexer)
KeyError: "['time'] not found in axis"
- Failure to train with the provided code.
- The article and the code Figure number do not match. Quite confusing, when we have to deal with the difference with the original article and reproduction article different numbering schemes already. I would suggest switching to manual numbering for everything and number all figures the same as the original article to reduce the cognitive load on the reader. In any case, the notebook and article number should match.
- In Figure 8 (article numbering), the scale of the y axis is orders of magnitude different for AE-CDNN-MAAE between the replication and the original article. Why?
- During setup:
On Ubuntu 20.04 2021/09/21
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.2.0 requires tensorboard<2.3.0,>=2.2.0, but you have tensorboard 2.6.0 which is incompatible. tensorflow 2.2.0 requires tensorflow-estimator<2.3.0,>=2.2.0, but you have tensorflow-estimator 2.3.0 which is incompatible.
## Minor comments about the text
- "In addition, we have also used/implemented one more loss function, Mean Absolute Average Error ‐ MAAE. The formula presented in the original article by [1] differs from the Mean Absolute Percentage Error ‐ MAPE formula, despite having similar intuitions. Thus, we chose to implement this loss function, and we have not found its use elsewhere, the difference between the loss functions is only in the fact that ..." : confusing text, we don't know what "this loss function" refers too (MAPE or MAAE?). Needs multiple re-read from the reader, and a bit of guesswork to make sure we understood right.
- "Note that we refer to [1]’s AE‐CDNN‐L1 as LossMAE and to [1]’s AE‐CDNN‐L2 as Loss MAAE" -> while you can point at an easier nomenclature being possibly better, I am not sure it makes it easier for the reader to understand the replication. This sentence also conflates the network and the loss function, which, given all the subsequent mentions to AE-CDNN-MAE and AE-CDNN-MAAE, could be avoided. In the notebook, some mentions of L1 instead of MAE remain.
- "The results obtained can be seen in the Figure below:" -> there is no figure and the text continues (because of Latex float placement). Refer to figures by their number to avoid any ambiguity. Same for tables, e. g. when you say: "in Tables below,".
- "These recordings underwent a pre‐processing in which the signals had a band filter between 0.53 to 40 Hz. There was also the removal of artifacts such as muscle movements or flicker movements." -> The article does not make it clear: you reproduce this pre-processing or was it done for you? How did you remove artifacts?
- "We adopted a Auto‐Encoder that allowed us to construct a smaller representation space." -> It is confusing, in the context of the replication. We is you or the original authors?
- "Table 8. Accuracy in Classification, with loss MAAE, on each fold cross‐validation, for Dataset 2." is not consistent with the captions of Table 6 and 7.
- "The selection followed the methodology used in [9], which analyses the variance of each patient, and after that, chooses the channel of greater variance to represent that individual" -> you make it sound that different individuals are represented by different channels. Is that intended?
- "Bonn" is named "Boon" throughout all the code.
- Checking variance: you explore three scenarios (and I like that a lot). One is named "full database", which is quite confusing to the reader: I was expecting that you would consider the full 23 patients. But only the first 10 are considered in any analysis. Would analysing the 23 patients dataset reproduce the original paper's findings?
## Optional comments about the text:
"Although the efforts in the last years" -> "Despite recent efforts/advances, "?
"are used windows" -> "windows are used"?
"with then" -> "with them"?
"4097 column." -> "4097 columns."
Thank you for the review, @khinsen and @benureau. We are working on responding to all points raised and will respond to all issues within two weeks.
@bruAristimunha Gentle reminder.
@bruAristimunha Happy new year. Gentle reminder
Original article: T. Wen and Z. Zhang, "Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals," in IEEE Access, vol. 6, pp. 25399-25410, 2018.
PDF URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG/blob/master/article/article.pdf Metadata URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG/blob/master/article/metadata.yaml Code URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG
Scientific domain: Neuroscience, EEG, Computational Neuroscience Programming language: Python Suggested editor: @rougier , @ogrisel , @qihongl , @rth , @rgutzen , @emmanuelle , @jpgard