[Re] Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals

bruAristimunha commented 4 years ago

Original article: T. Wen and Z. Zhang, "Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals," in IEEE Access, vol. 6, pp. 25399-25410, 2018.

PDF URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG/blob/master/article/article.pdf Metadata URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG/blob/master/article/metadata.yaml Code URL: https://github.com/bruAristimunha/Re-Deep-Convolution-Neural-Network-and-Autoencoders-Based-Unsupervised-Feature-Learning-of-EEG

Scientific domain: Neuroscience, EEG, Computational Neuroscience Programming language: Python Suggested editor: @rougier , @ogrisel , @qihongl , @rth , @rgutzen , @emmanuelle , @jpgard

rougier commented 4 years ago

Thanks you for your submission and sorry for the delay. We'll assign an editor soon.

@koustuvsinha Could you serve as editor for this submission ?

bruAristimunha commented 4 years ago

@rougier any update?

rougier commented 4 years ago

@bruAristimunha Sorry for the delay. Pinging @koustuvsinha

bruAristimunha commented 4 years ago

Hi @rougier, Sorry to bother you, is it possible to switch to another possible editor? It seems to me that he cannot take over.

rougier commented 4 years ago

@bruAristimunha Sorry for the delay in my answer. Yes, it seems @koustuvsinha is not responsive. I'll try by mail.

@gdetor Could you edit this submission ?

gdetor commented 4 years ago

@rougier Sure, I can edit this submission.

gdetor commented 4 years ago

Hi @neuronalX could you review this submission? Thank you.

gdetor commented 4 years ago

Hi @ogrisel Could you review this submission? Thank you.

neuronalX commented 4 years ago

Hi @gdetor I am not an expert in CNNs and given the number of experiments, the paper would need some expert to review the codes properly.

May be you could ask to @vthierry or Thalita Firmo Drumond.

gdetor commented 4 years ago

Hi @neuronalX Thank you for your response. No worries.

@benureau Could you please review this submission?

gdetor commented 4 years ago

@benureau Gentle reminder

gdetor commented 4 years ago

@ogrisel Gentle reminder

gdetor commented 4 years ago

@benureau Gentle reminder

benureau commented 4 years ago

@gdetor: Sorry, I missed the previous messages. I will look at the submission in the next 24 hours and tell you if I feel qualified to review it.

gdetor commented 4 years ago

@benureau No worries, thank you for the update

benureau commented 4 years ago

@gdetor: I will review this submission in the next 30 days.

gdetor commented 4 years ago

@benureau Thank you

gdetor commented 4 years ago

@damiendr Could you please review this submission?

koustuvsinha commented 4 years ago

Sorry for the delayed response, I missed this thread completely. Seems there is still a spot for reviewers open, @gdetor if you like I can review this submission.

gdetor commented 4 years ago

@koustuvsinha Sure

koustuvsinha commented 4 years ago

Here is my review. The reproducibility effort is very comprehensive and thorough, but the report is difficult to read and needs more work and polish to be in a shape to present in the journal. I'm happy to reply and re-evaluate the updated draft.

In Section 3.1, the disagreement in the literature on the total number of instances is not clear. I think this relates to the Bonn University EEG dataset, which is missing from this section.
I think authors can drop the Related works section, and cut down on the discussion of the autoencoder model. These details can be referred to in the original paper.
I propose to rewrite Section 5.1 - this is about choosing the channels in the dataset to work on, with the hypothesis that EEG channels with higher variance would offer more discriminatory signal for epilepsy prediction. It wasn't immediately clear on why authors report "three interpretations". On deeper inspection, it seems variance calculation can be done in multiple ways: 1. for each sample. 2. per person or 3. for the full dataset. This needs to be mentioned explicitly.
- I think the 1st one (for each sample) is the process used by the original authors
- The original authors did report choosing the channels based on evidence from "first 10 patients". Did you also notice that?
In Section 5.2, What is "dimension"? 64 vs 128? Not explained.
Since there is a lot of analysis results presented in the paper, I would suggest the authors to clearly denote the analysis in their own section. For each section, start with which claim of the original paper you investigated, and follow with your results. Dumping all tables doesn't make the paper readable, and we don't understand which points of the original paper was or not reproducible.
Table 15 to 21 - too many values, could this be aggregated?
What are the key takeways? Summary of reproduction needed.
Conclusion: Typo of "Auto-Enconder"?

I have not gone through the code in detail yet. I would wait for the changes, atleast a better logical presentation of the results before verifying the claims in the code.

gdetor commented 4 years ago

@koustuvsinha Thank you for the review. @bruAristimunha Can you address reviewer's comments? @benureau Gentle reminder

benureau commented 4 years ago

@gdetor: sorry for the delay, going through the review now, assembling the hardware I need to run the code.

@bruAristimunha: did you contact the original authors about what seemed ambiguous in the original paper? Specifically, for instance, section 5.1? And all the assumptions you made in section 3.1? While a paper should be replicable without reaching out to the original authors, there's a point to be made about ferreting out the missing details and making them explicit in a replication effort submitted to ReScience.

bruAristimunha commented 4 years ago

@gdetor, I'm just waiting the other authors to answer @koustuvsinha's review.

@benureau Yes, but my e-mails were not answered.

bruAristimunha commented 4 years ago

I apologize for the delay in responding, @gdetor, @koustuvsinha and @benureau. All the reviewers' comments were accepted. The work will go through re-editing with the removal of several sections that are not related to reproduction itself.

We will try to improve the fluidity of the text, explaining that we did not have a response from the authors and that we did not obtain the same result, in terms of variance and results.

This review round will be conducted by @DiogoEduardo. In particular, we plan to remove sections: 2 (Related Work), 3.2 (AutoEncoders), 4.3 (metrics definition) and 5.3 (Extension). In addition to improved fluidity as a whole.

benureau commented 4 years ago

@bruAristimunha: I am halfway through my review. I will wait for you to update your submission, since the changes you are preparing seem important.

A few comments:

Have you noticed that in the original paper, Table 3 and 4 contain the exact same values? With the loss and the dataset changing between the two? It seems to be a copy-paste error. It would affect the validity of some of the comparisons you are making.
You are using different terminology for the classifiers than the original article. Be sure to explicitly link them together "AE-CDNN-MAE is [11]'s AE-CDNN-L1", etc.
Being explicit in which Table of the original article is being reproduced or compared in each of the figure's captions would help avoid the reader second-guessing anything.
The lack of variance of the SVMs in Tables 2 and 3 of the reproduction is concerning, would suggest a qualitative difference between your reproduction and the original article's implementation. It should probably be discussed, not just stated.

Code:

jupyter is not in the requirements.txt file.
Give the command to correctly launch the jupyter notebook, and from which directory level.

Installing currently throws an error, because the scipy version is not specified:


ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

tensorflow 2.1.0 requires scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.2 which is incompatible.

- The third cell of 7.4.5 in the notebook complains about pyarrow dependency missing.
- The last cell of 3. throws errors with True on all three first lines:
```bash
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_accumulated".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_file".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/variance_person".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/reduced".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon/reduced".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit/save_model".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon/save_model".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/boon".format(base_fold)'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: ` rm -r $"{}/data/chbmit".format(base_fold)'
mkdir: cannot create directory ‘../article/figure’: File exists
mkdir: cannot create directory ‘../article/table’: File exists

I was able to re-generate all the figures present in your article from your pre-saved training

I got a 404 error when invoking download_chbmit to download from scratch.


Downloading the folder information: ../data/chbmit/
Downloading the folder files: ../data/chbmit/
Downloading the folder information: ../data/chbmit/chb01/
Downloading the folder files: ../data/chbmit/chb01/
Downloading the folder information: ../data/chbmit/chb02/
Downloading the folder files: ../data/chbmit/chb02/

HTTPError Traceback (most recent call last) [...]


- Currently re-running training. Will post my results when it finishes.

benureau commented 4 years ago

After running into some trouble with the rm -r commands above not working and having to do it manually, I was unable to reproduce Figure 2:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-8de2b88f16b3> in <module>
      1 variance_per_person = get_variance_by_person(PATH_CHBMIT,
----> 2                                              range_= (1,11))

~/rescience/ReScience-submission/code/variance.py in get_variance_by_person(path_dataset, range_)
    318             var_pearson.append(accumulate_var)
    319
--> 320         variance_df = DataFrame(var_pearson).drop("time", 1)
    321
    322         variance = DataFrame(

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    472                     if is_named_tuple(data[0]) and columns is None:
    473                         columns = data[0]._fields
--> 474                     arrays, columns = to_arrays(data, columns, dtype=dtype)
    475                     columns = ensure_index(columns)
    476

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/internals/construction.py in to_arrays(data, columns, coerce_float, dtype)
    466     elif isinstance(data[0], ABCSeries):
    467         return _list_of_series_to_arrays(
--> 468             data, columns, coerce_float=coerce_float, dtype=dtype
    469         )
    470     elif isinstance(data[0], Categorical):

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/internals/construction.py in _list_of_series_to_arrays(data, columns, coerce_float, dtype)
    514         index = getattr(s, "index", None)
    515         if index is None:
--> 516             index = ibase.default_index(len(s))
    517
    518         if id(index) in indexer_cache:

TypeError: object of type 'int' has no len()

and figure 1 and 2 were different than the one in the paper, with for instance much less channels on Figure 1. I stopped there. Could you make sure that all the figures are reproducible from scratch, on a vanilla machine/vanilla conda environment? Or explain the sources of possible differences?

Obtained figure 1:

Obtained figure 3:

gdetor commented 4 years ago

@bruAristimunha Gentle reminder

gdetor commented 3 years ago

Hi @bruAristimunha Any progress on this?

bruAristimunha commented 3 years ago

Hello @gdetor, sorry for the delay. I am waiting for author's review, I believe that in a week everything will be concluded.

rougier commented 3 years ago

@gdetor @bruAristimunha Gentle reminder

gdetor commented 3 years ago

Hi @bruAristimunha, Is there any progress on this matter?

bruAristimunha commented 3 years ago

Hello @gdetor and @rougier

I apologize for the delay in contacting you. Unfortunately, for personal reasons, I won't be able to handle the modifications. However, the second author @DiogoEduardo will continue with the necessary revisions.

rougier commented 3 years ago

@bruAristimunha Ok, no problem. @DiogoEduardo Can you address the reviews?

gdetor commented 3 years ago

Hi @DiogoEduardo gentle reminder

DiogoEduardo commented 3 years ago

Hi @gdetor. Sorry about the delay. We completed the changes in the text and adapted the jupyter notebook to work on Linux/Windows. I have tested in both systems using vanilla machine / vanilla conda environment as stated in the readme so I think this type of error is solved to reproduce our results. The text changes stated before by @bruAristimunha are also completed.

gdetor commented 3 years ago

@DiogoEduardo Thank you for the update @koustuvsinha @benureau Could you please review the modified article?

gdetor commented 3 years ago

@koustuvsinha @benureau Gentle reminder

benureau commented 3 years ago

@gdetor: of course, will do the review soon.

gdetor commented 2 years ago

@koustuvsinha @benureau Gentle reminder

rougier commented 2 years ago

@gdetor Any update?

benureau commented 2 years ago

@gdetor, @rougier: I am finishing the review right now. Hopefully will be done by the end of the day.

gdetor commented 2 years ago

@koustuvsinha Gentle reminder

koustuvsinha commented 2 years ago

Sorry about the delay, thanks for reminding! The paper looks much polished now, thanks for incorporating the suggested changes. Looks good to me now, it would be perhaps better to consolidate the key findings of Section 4 in a new section "Summary of Reproducibility" before/after the introduction to help readers better gain a quick insight of the results.

benureau commented 2 years ago

Sorry, still verifying some part of the code. Will be done soon.

benureau commented 2 years ago

Deep apologies, I am re-running training one last time and then submitting my review.

benureau commented 2 years ago

I am sorry for the delay. Part of it was that I spent a lot of time trying to reproduce the results by re-running training without managing, and thought I was doing something wrong.

In the review below, I detail several problems, big and small, I have with the replication effort right now. However, I have a positive opinion of the work that has been done in this replication under difficult circumstances (ambiguous original article, no possible contact with the original authors, an obvious problem in the original article (copy-pasted error with Figure 3/4)).

Criterions for replication

Taken from the review guidelines.

The actual replication of the research. The reviewer must evaluate the authors’ claims about a successful replication, applying the standards of the field.

The authors have done a serious job trying to reproduce the results of the original article. Much of their results fail to reproduce quantitatively.

One problem seems to be that the original article and the authors do not use the same testing dataset for dataset 1. The original article says (p. 25406): "In addition, because the ratio of positive and negative samples in the testing set is 2:1, the accuracy is 0.667 when all samples were deemed as positive and the accuracy is 0.333 when all samples were deemed as negative." Yet, in the replication, the ratio seems to be 3:2 (i.e., if I am not mistaken, the whole dataset?), generating scores of 0.6 and 0.4 when all samples are deemed positive or negative respectively (e.g., Table 5 here). Is that something the authors considered? Could the original article only use C, D and E subsets of dataset 1 for testing?

Reproducibility of the replication. The reviewers must be able to run the proposed implementation on their computer, and obtain the same results as the authors with the limits of the state of the art of the field.

I have tried to rerun the code as much as possible. The replication provides different flags: download_again, train_again and eval_again. The notebook with the code comes precomputed. The results of the precomputed notebook are the same as the article, and we can reproduce the same results by re-running the notebook and the notebook with the flag eval_again turned on (True). download_again is problematic as the Bonn University EEG database seems to have moved or disappeared (any chance to put it in a more stable place on zenodo or figshare?). Downloading the Children’s Hospital of Boston EEG database is slow (big files), and any issue during the download means restarting the download from scratch with the provided code. Not ideal, but works eventually.

I also ran the notebook with train_again set to True. I did that on a fully updated Ubuntu 20.04 (as of 2021/11/15), on a NVIDIA RTX 2080ti. It takes approximately 6 hours. The variance code does not run properly anymore (KeyError: "['time'] not found in axis" see below).

I have trouble reproducing the classification results. I sometimes obtain similar results as the ones of the article, but not always. The main problem, though, seems to be that little training is done. This is Figure 9 (notebook numbering):

Am I doing anything wrong? So, right now, the code fails to train for me, it seems. I am attaching the notebook I trained: rerun_train.ipynb.zip

Clarity of the code and the instructions for running it. Uncommented or obfuscated code is as bad as no code at all.

The code is clear enough and the instructions for running it are good.

Clarity and completeness of the accompanying article, in which the authors should clearly state why they think they have replicated the paper (same figures, same graphics, same behavior, etc.) and explain any obstacles they have encountered during the replication work. The reviewers should also consider if the article is sufficiently self-contained.

I have a problem with this point. The authors do talk quite in detail about the issue with the reproducibility of the variance analysis, and try (in code) several theories they had for the difference. This is well done.

However, considering the classification results, the authors plot and describe the quantitative differences between their replication and the original article, but do not provide much if any explanation, or even possible theories for them (e.g., "change in scipy classifiers implementation"). An instance of this is, "Although the results in original paper also have few variations for the classifiers svm_linear, svm_radial and multi_layer we had no variations in these classifers for the autoencoder AE‐CDNN‐MAAE." (btw classifers -> classifiers). This is troubling because in Table 9, svm_linear and svm_radial fare quite poorly compared to the other classifiers in the replication, with the result of the original paper's Table 9 placing SVM classifier quite high in performance (93% vs 60% in the replication). What is going on there? The features seems to be learned for some classifiers but not others.

It's also not clear if there are qualitative differences between the replication and the original article. Does the replication support the same conclusions of the original article? Some of the text seems to approach this conclusion ("In Figure 9 we observe similar average accuracy between AE‐CDNN‐MAE, AE‐CDNN‐MAAE, PCA and SRP for both datasets. The same is observed in the original article, as well as a similar behavior"), but a more explicit phrasing would be preferable.

Issues with the replication

As mentioned before, the apparent 3:2 ratio of the testing dataset for dataset 1, where the original article uses 2:1.

When eval_train is set to True in the notebook, I obtain in section 8.1, first code cell:


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_3745/1033232197.py in <module>
----> 1 variance_by_file = get_variance_by_file(PATH_CHBMIT)

~/projects/Rescience2021/rerun_train/code/variance.py in get_variance_by_file(pathdataset, range) 211 rank_variance.append(rank_by_file) 212 --> 213 variance_df = DataFrame(rank_variance).drop("time", 1) 214 215 variance = DataFrame(

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/frame.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3995 level=level, 3996 inplace=inplace, -> 3997 errors=errors, 3998 ) 3999

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3934 for axis, labels in axes.items(): 3935 if labels is not None: -> 3936 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 3937 3938 if inplace:

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors) 3968 new_axis = axis.drop(labels, level=level, errors=errors) 3969 else: -> 3970 new_axis = axis.drop(labels, errors=errors) 3971 result = self.reindex(**{axis_name: new_axis}) 3972

~/.pyenv/versions/miniconda3-latest/envs/eeg/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors) 5015 if mask.any(): 5016 if errors != "ignore": -> 5017 raise KeyError(f"{labels[mask]} not found in axis") 5018 indexer = indexer[~mask] 5019 return self.delete(indexer)

KeyError: "['time'] not found in axis"


- Failure to train with the provided code.

- The article and the code Figure number do not match. Quite confusing, when we have to deal with the difference with the original article and reproduction article different numbering schemes already. I would suggest switching to manual numbering for everything and number all figures the same as the original article to reduce the cognitive load on the reader. In any case, the notebook and article number should match. 

- In Figure 8 (article numbering), the scale of the y axis is orders of magnitude different for AE-CDNN-MAAE between the replication and the original article. Why?

- During setup:
On Ubuntu 20.04 2021/09/21

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.2.0 requires tensorboard<2.3.0,>=2.2.0, but you have tensorboard 2.6.0 which is incompatible. tensorflow 2.2.0 requires tensorflow-estimator<2.3.0,>=2.2.0, but you have tensorflow-estimator 2.3.0 which is incompatible.



## Minor comments about the text
- "In addition, we have also used/implemented one more loss function, Mean Absolute Average Error ‐ MAAE. The formula presented in the original article by [1] differs from the Mean Absolute Percentage Error ‐ MAPE formula, despite having similar intuitions. Thus, we chose to implement this loss function, and we have not found its use elsewhere, the difference between the loss functions is only in the fact that ..." : confusing text, we don't know what "this loss function" refers too (MAPE or MAAE?). Needs multiple re-read from the reader, and a bit of guesswork to make sure we understood right.

- "Note that we refer to [1]’s AE‐CDNN‐L1 as LossMAE and to [1]’s AE‐CDNN‐L2 as Loss MAAE" -> while you can point at an easier nomenclature being possibly better, I am not sure it makes it easier for the reader to understand the replication. This sentence also conflates the network and the loss function, which, given all the subsequent mentions to AE-CDNN-MAE and AE-CDNN-MAAE, could be avoided. In the notebook, some mentions of L1 instead of MAE remain.

- "The results obtained can be seen in the Figure below:" -> there is no figure and the text continues (because of Latex float placement). Refer to figures by their number to avoid any ambiguity. Same for tables, e. g. when you say: "in Tables below,". 

- "These recordings underwent a pre‐processing in which the signals had a band filter between 0.53 to 40 Hz. There was also the removal of artifacts such as muscle movements or flicker movements." -> The article does not make it clear: you reproduce this pre-processing or was it done for you? How did you remove artifacts?

- "We adopted a Auto‐Encoder that allowed us to construct a smaller representation space." -> It is confusing, in the context of the replication. We is you or the original authors? 

- "Table 8. Accuracy in Classification, with loss MAAE, on each fold cross‐validation, for Dataset 2." is not consistent with the captions of Table 6 and 7.

- "The selection followed the methodology used in [9], which analyses the variance of each patient, and after that, chooses the channel of greater variance to represent that individual" -> you make it sound that different individuals are represented by different channels. Is that intended?

- "Bonn" is named "Boon" throughout all the code. 

- Checking variance: you explore three scenarios (and I like that a lot). One is named "full database", which is quite confusing to the reader: I was expecting that you would consider the full 23 patients. But only the first 10 are considered in any analysis. Would analysing the 23 patients dataset reproduce the original paper's findings?

## Optional comments about the text:
"Although the efforts in the last years" -> "Despite recent efforts/advances, "?
"are used windows" -> "windows are used"?
"with then" -> "with them"?
"4097 column." ->   "4097 columns."

bruAristimunha commented 2 years ago

Thank you for the review, @khinsen and @benureau. We are working on responding to all points raised and will respond to all issues within two weeks.

rougier commented 2 years ago

@bruAristimunha Gentle reminder.

gdetor commented 2 years ago

@bruAristimunha Happy new year. Gentle reminder

ReScience / submissions