MedMaxLab / selfEEG

selfEEG: a Python library for Self-Supervised Learning on Electroencephalography (EEG) data
https://selfeeg.readthedocs.io/en/stable/
MIT License
35 stars 11 forks source link

[JOSS] Review #6

Closed vferat closed 7 months ago

vferat commented 7 months ago

Hey all,

I'm opening this issue to allow me to make comments for the JOSS review (openjournals/joss-reviews#6224).

While testing the library:

Documentation

Errors

I have tried the library on 3 different computers. For two of them, I encounter the same error while running the SSL_guidenotebook

---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
File [d:\Python\envs\selfEEG\Lib\site-packages\torch\utils\data\dataloader.py:1133](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1133), in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   [1132](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1132) try:
-> [1133](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1133)     data = self._data_queue.get(timeout=timeout)
   [1134](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1134)     return (True, data)

File [d:\Python\envs\selfEEG\Lib\multiprocessing\queues.py:114](file:///D:/Python/envs/selfEEG/Lib/multiprocessing/queues.py:114), in Queue.get(self, block, timeout)
    [113](file:///D:/Python/envs/selfEEG/Lib/multiprocessing/queues.py:113)     if not self._poll(timeout):
--> [114](file:///D:/Python/envs/selfEEG/Lib/multiprocessing/queues.py:114)         raise Empty
    [115](file:///D:/Python/envs/selfEEG/Lib/multiprocessing/queues.py:115) elif not self._poll():

Empty: 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1)
----> [1](vscode-notebook-cell:?execution_count=8&line=1) loss_info = SelfMdl.fit(train_dataloader = trainloader, augmenter=Augmenter, epochs=5,
      [2](vscode-notebook-cell:?execution_count=8&line=2)                         optimizer=optimizer, loss_func= loss, loss_args= loss_arg,
      [3](vscode-notebook-cell:?execution_count=8&line=3)                         lr_scheduler= scheduler, EarlyStopper=earlystop,
      [4](vscode-notebook-cell:?execution_count=8&line=4)                         validation_dataloader=valloader,
      [5](vscode-notebook-cell:?execution_count=8&line=5)                         verbose=True, device= device, return_loss_info=True
      [6](vscode-notebook-cell:?execution_count=8&line=6)                        )

File [~\Documents\GitHub\selfEEG\selfeeg\ssl\ssl.py:1042](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1042), in SimCLR.fit(self, train_dataloader, epochs, optimizer, augmenter, loss_func, loss_args, lr_scheduler, EarlyStopper, validation_dataloader, verbose, device, cat_augmentations, return_loss_info)
   [1036](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1036)     self.train()
   [1038](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1038) with tqdm.tqdm(total=N_train+N_val, ncols=100, 
   [1039](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1039)            bar_format='{desc}{percentage:3.0f}%|{bar:15}| {n_fmt}/{total_fmt}'
   [1040](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1040)            ' [{rate_fmt}{postfix}]',
   [1041](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1041)            disable=not(verbose), unit=' Batch',file=sys.stdout) as pbar:
-> [1042](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1042)     for batch_idx, X in enumerate(train_dataloader):
   [1044](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1044)         optimizer.zero_grad()
   [1046](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Victor/Documents/GitHub/selfEEG/Notebooks/~/Documents/GitHub/selfEEG/selfeeg/ssl/ssl.py:1046)         if X.device.type!=device.type:

File [d:\Python\envs\selfEEG\Lib\site-packages\torch\utils\data\dataloader.py:631](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:631), in _BaseDataLoaderIter.__next__(self)
    [628](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:628) if self._sampler_iter is None:
    [629](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:629)     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    [630](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:630)     self._reset()  # type: ignore[call-arg]
--> [631](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:631) data = self._next_data()
    [632](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:632) self._num_yielded += 1
    [633](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:633) if self._dataset_kind == _DatasetKind.Iterable and \
    [634](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:634)         self._IterableDataset_len_called is not None and \
    [635](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:635)         self._num_yielded > self._IterableDataset_len_called:

File [d:\Python\envs\selfEEG\Lib\site-packages\torch\utils\data\dataloader.py:1329](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1329), in _MultiProcessingDataLoaderIter._next_data(self)
   [1326](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1326)     return self._process_data(data)
   [1328](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1328) assert not self._shutdown and self._tasks_outstanding > 0
-> [1329](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1329) idx, data = self._get_data()
   [1330](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1330) self._tasks_outstanding -= 1
   [1331](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1331) if self._dataset_kind == _DatasetKind.Iterable:
   [1332](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1332)     # Check for _IterableDatasetStopIteration

File [d:\Python\envs\selfEEG\Lib\site-packages\torch\utils\data\dataloader.py:1295](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1295), in _MultiProcessingDataLoaderIter._get_data(self)
   [1291](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1291)     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   [1292](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1292)     # need to call `.task_done()` because we don't use `.join()`.
   [1293](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1293) else:
   [1294](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1294)     while True:
-> [1295](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1295)         success, data = self._try_get_data()
   [1296](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1296)         if success:
   [1297](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1297)             return data

File [d:\Python\envs\selfEEG\Lib\site-packages\torch\utils\data\dataloader.py:1146](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1146), in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   [1144](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1144) if len(failed_workers) > 0:
   [1145](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1145)     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> [1146](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1146)     raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
   [1147](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1147) if isinstance(e, queue.Empty):
   [1148](file:///D:/Python/envs/selfEEG/Lib/site-packages/torch/utils/data/dataloader.py:1148)     return (False, None)

RuntimeError: DataLoader worker (pid(s) 2916, 11640, 12340, 7664) exited unexpectedly

Everything was working perfectly on the 3rd one.

Code base

This section is not required per say by the JOSS review guidelines, but seems to me to be an important point. I therefore leave you free to address these points or not, and will not delay the review for that reason.

If you want the library to evolve over time, I suggest:

Conclusion

selfEEG is a functional python library that and should make it easy to apply SSL models on EEG data. The code base is solid. Thanks for all your hard work

fedepup commented 7 months ago

Hi @vferat. Before start answering to your comments, may I ask you if it is ok for you to convert the bullet list in a checklist? Just to not miss any of your concerns. Of course if this is not a problem.

Got and error while working with freq of type float: Unknown format code 'd' for object of type 'float' I can also recommend using logging instead of print for verbose.

I believe you are working with the selfEEG version uploaded on PyPI or anaconda, which is a few commits behind the main branch. In fact, few days after my submission to JOSS, I've experienced two little bugs in the dataloading module, caused by the window length or the sampling rate not being integer values:

Both issues have been solved in the following commit 811356f83f03ad70a511c48467e4812e2c1b6c6a. ('ve also changed the formatting option from 5.0f to 5.2f 88156f55b3f6e059d023c2e05118e690a5e14007)

Why are you still experiencing this issue? Simply because I'm waiting the JOSS review to be done before uploading a post edit version (ideally v0.1.1) on both PyPI and anaconda with all the revision. Really sorry for not having told you about the corrections on the dataloading module.

vferat commented 7 months ago

Hey @fedepup,

No problem, I was indeed using the PIP version, I'll switch to main to avoid outdated comments. Thanks for the quick reply !

BTW, if you are interested, you can find a GitHub workflow to automatically push a Github release to PYPI here (just make sure to change package the version)

fedepup commented 7 months ago

No problem, I was indeed using the PIP version, I'll switch to main to avoid outdated comments.

Thank you and sorry again for the inconvenience.

BTW, if you are interested, you can find a GitHub workflow to automatically push a Github release to PYPI here (just make sure to change package the version)

I will certainly add it in my to-do list!

vferat commented 7 months ago

Hey @fedepup, I've updated my review (first message). Only minor comments, mainly adding a more concrete example in the tutorial section. Otherwise, everything looks good to me ! Nice work !

fedepup commented 7 months ago

Thanks for the update, @vferat. I will add another notebook with a more concrete example as soon as possible.

For the error in the SSL_guide notebook, I think you have run the one in the Notebooks folder, right? The one in the docs folder (which I considered the "official" example) has some minor updates based on wmvanvliet revision. In short, I think the error is related to the num_worker parameter of the pytorch Dataloaders. May I ask you if you can try to run the one in the docs folder or set the parameter num_worker=0 in the one where you have encountered the error? From my side, if this solves the issue, I will copy the notebooks of the docs folder inside the notebooks one so to keep everything updated. (I've run the SSL_guide in the docs folder notebook with a mac, linux and windows devices, and everything worked fine.)

For the last three points, I think it's not a problem to work on that, but I will do it after the notebook concerns.

Thank you again!

fedepup commented 7 months ago

Hi, @vferat. Sorry for making you wait. I've uploaded a new notebook and updated old ones included in the Notebook folder (see this commit e573df17acc1f15a77b0fc94b92aeaa8d23ae3b2 ). Just a comment on the mentioned commit.

Notebook updates: now the notebook folder includes all the updated version of the documentation notebooks, so to be sure to not run any outdated code. Again, for the error you encountered, I'm pretty sure that it was related to the num_worker argument of the Dataloader being set greater than zero, which was changed to zero in the new version to avoid this problem. Sorry for having forgotten to update such folder.

Use case with real-world data: as you suggested, I've included a new example with the eegbci dataset (also called EEGMMI). I think everything is clearly explained and the code and experimental design simple to understand, but I will leave to you the final decision. To make things easier, I've used an already preprocessed version of such dataset, which was released as a resource for the book "Deep Learning for EEG-based Brain-Computer Interface". Data can be easily downloaded here (I've also included a list of command to run in a notebook cell to download and unzip the data in a "Data" folder).

Hope this will solve all the concerns that are blocking you to fill the documentation entry in your reviewer's checklist. Thank you again.

Additional note: I will come back tomorrow with a new comment assessing all the extra things about the maintenance improvement (which I care a lot), so to close this issue properly.

vferat commented 7 months ago

Hey @fedepup !

I really like the new tutorial and thing I will rely help people not familiar with SSL to have a good start !

I keep this issue open for the maintenance part, but everything LGTM ! Thanks for all the hard work and for providing this software to our community !

fedepup commented 7 months ago

Hi, @vferat. First of all, thank you again for having spent some of your precious time in reviewing selfEEG for JOSS. I believe that your suggestions were really helpful in improving the quality of the library and its understanding.

Here is my final comment related to the extra part about code maintenance. All the things were tested in a forked repository and then merged into this one, so to not mess up things. Please refer to #7 and #8 for the complete list of changes (number 8 also includes all the pre-commit autofix).

to add additional tests by installing the "main" version of https://github.com/pytorch/pytorch so as to be able to anticipate the changes needed to remain compatible.

Now tests are run with pytorch nightly instead of with the last stable version

use a test matrix to check compatibility with several OS/python versions

Now tests are performed on all three os systems and on python 3.10 and 3.11

to maintain a simple & clean code base, add codestyle tests or a precommit (example: https://github.com/vferat/pycrostates/blob/main/.pre-commit-config.yaml)

I added a pre-commit with basic things I believe are useful for a python project. I'm not used to pre-commit, so for now I decided to add only things of which I completely understood the functionality. As you can see from #8, the code base improved a lot and I was able to find another typo in the dataloading module and lots of unused import (probably related to my initial tests in a debugging notebook). I'm pretty sure I will add other hooks in the future.

I think this will solve everything and this issue can be closed. But I will leave this decision to you.