NeuroTechX / moabb

Mother of All BCI Benchmarks
https://neurotechx.github.io/moabb/
BSD 3-Clause "New" or "Revised" License
678 stars 176 forks source link

Reproducing results for PhysionetMI? #640

Closed trialan closed 2 months ago

trialan commented 2 months ago

I'm switching to MOABB for better comparing my results with other papers, so thanks for this project!

Problem: Here we have the paperswithcode leaderboard for PhysionetMI left vs right hand classification. Is the code for these numbers public? I know the numbers are given in the moabb paper but that's not super useful for me.

It's easy to get numbers for, e.g., BNCI_2014_001, as in this moabb docs tutorial. But swapping out BNCI_2014_001 for PhysionetMI breaks this code:

paradigm = LeftRightImagery()
# Because this is being auto-generated we only use 2 subjects
dataset = BNCI2014_001() #swap this for PhsyionetMI
dataset.subject_list = dataset.subject_list[:2]
datasets = [dataset]
overwrite = True  # set to True if we want to overwrite cached results
evaluation = CrossSessionEvaluation(
    paradigm=paradigm, datasets=datasets, suffix="examples", overwrite=overwrite
)

results = evaluation.process(pipelines)

print(results.head())

I get this error:

Exception: No datasets left after paradigm
            and evaluation checks
> /Users/thomasrialan/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/evaluations/base.py(134)__init__()
    132             self.datasets = datasets
    133         else:
--> 134             raise Exception(
    135                 """No datasets left after paradigm
    136             and evaluation checks"""

I think this may have something to do with the fact that in PhysionetMI there is one subject who got sampled at a different frequency (I think it's 160Hz for every subject except one), as noted in this issue by @sylvchev .

So is the current status of PhysionetMI broken, or is there something obvious I'm missing? Somehow the numbers for the leaderboard were generated so I imagine it's the latter.

I have downloaded all the data in the relevant mne folder so I don't think that's the issue.

PierreGtch commented 2 months ago

Hi @trialan, could you send the complete script you tried to execute which resulted in this error?

trialan commented 2 months ago

Thanks Pierre, I've made a bit of progress on this (it actually was that I was missing some data). Now the results of this script:

from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

import moabb
from moabb.datasets import BNCI2014_001, PhysionetMI
from moabb.evaluations import CrossSessionEvaluation, WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery, MotorImagery

moabb.set_log_level("error")
pipelines = {}
pipelines["CSP+LDA"] = make_pipeline(CSP(n_components=30), LDA())
dataset = PhysionetMI()
paradigm = MotorImagery()
datasets = [dataset]
overwrite = True  # overwrite cached results
evaluation = WithinSessionEvaluation(
    paradigm=paradigm, datasets=datasets, suffix="examples", overwrite=overwrite
)
results = evaluation.process(pipelines)
print(results.groupby('pipeline').score.mean())

are consistent with the reported results in Table D1 of the moabb paper.

However I wonder how to get a score for only some categories, as table D1 is for all categories, but the paperswithcode leaderboard is only left vs right hand. So I try to run this script instead:

#every other line the same
paradigm = MotorImagery(events=["left_hand", "right_hand"])

And see this error:

(eegenv) ➜  eeg git:(main) ✗ python -m IPython -i benchmarking.py                                                                                                                   
Python 3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]                                                                                   
Type 'copyright', 'credits' or 'license' for more information                                                                                                     
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.                                                                                              
---------------------------------------------------------------------------                                                                                       
TypeError                                 Traceback (most recent call last)                                                                                       
File ~/Documents/code/eeg/benchmarking.py:28                                                                                                                      
     23 overwrite = True  # overwrite cached results  
---> 28 results = evaluation.process(pipelines)                                                                                                                   
     29 print(results.groupby('pipeline').score.mean())                                                                                                           
     33 #N=30, CSP+LDA (all classes)    0.565219                                                                                                                                

File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/evaluations/base.py:187, in BaseEvaluation.process(self, pipelines, param_grid, postprocess_
pipeline)                                                                                                                                                         
    185 for dataset in self.datasets:                                                                                                                             
    186     log.info("Processing dataset: {}".format(dataset.code))                                                                                               
--> 187     process_pipeline = self.paradigm.make_process_pipelines(                                                                                              
    188         dataset,                                                                                                                                          
    189         return_epochs=self.return_epochs,                                                                                                                 
    190         return_raws=self.return_raws,                                                                                                                     
    191         postprocess_pipeline=postprocess_pipeline,                                                                                                        
    192     )[0]                                                                                                                                                  
    193     # (we only keep the pipeline for the first frequency band, better ideas?)                                                                             
    195     results = self.evaluate(                                                                                                                              
    196         dataset,                                                                                                                                          
    197         pipelines,                                                                                                                                        
   (...)                                                                                                                                                          
    200         postprocess_pipeline=postprocess_pipeline,                                                                                                        
    201     )                                                                                                                                                     

File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:140, in BaseProcessing.make_process_pipelines(self, dataset, return_epochs
, return_raws, postprocess_pipeline)                                                                                                                              
    137 self.prepare_process(dataset)                                                                                                                             
    139 raw_pipelines = self._get_raw_pipelines()                                                                                                                 
--> 140 epochs_pipeline = self._get_epochs_pipeline(return_epochs, return_raws, dataset)                                                                          
    141 array_pipeline = self._get_array_pipeline(                                                                                                                
    142     return_epochs, return_raws, dataset, postprocess_pipeline                                                                                             
    143 )                                                                                                                                                         
    145 if array_pipeline is not None:                                                                                                                            

File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:389, in BaseProcessing._get_epochs_pipeline(self, return_epochs, return_ra
ws, dataset)                                                                                                                                                      
    380     bmax = tmax                                                                                                                                           
    381 steps = []                                                                                                                                                
    382 steps.append(                                                                                                                                             
    383     (       
    384         "epoching",
    385         make_pipeline(
    386             ForkPipelines(
    387                 [
    388                     ("raw", make_pipeline(None)),
--> 389                     ("events", self._get_events_pipeline(dataset)),
    390                 ]
    391             ),
    392             RawToEpochs(
    393                 event_id=self.used_events(dataset),
    394                 tmin=bmin,
    395                 tmax=bmax,
    396                 baseline=baseline,
    397                 channels=self.channels,
    398                 interpolate_missing_channels=self.interpolate_missing_channels,
    399             ),
    400         ),
    401     )
    402 )
    403 if bmin < tmin or bmax > tmax:
    404     steps.append(("crop", get_crop_pipeline(tmin=tmin, tmax=tmax)))

File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:538, in BaseParadigm._get_events_pipeline(self, dataset)
    537 def _get_events_pipeline(self, dataset):
--> 538     event_id = self.used_events(dataset)
    539     return RawToEvents(event_id=event_id, interval=dataset.interval)

File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/motor_imagery.py:377, in MotorImagery.used_events(self, dataset)
    375         if len(out) == self.n_classes:
    376             break
--> 377 if len(out) < self.n_classes:
    378     raise (
    379         ValueError(
    380             f"Dataset {dataset.code} did not have enough "
    381             f"events in {self.events} to run analysis"
    382         )
    383     )
    384 return out

TypeError: '<' not supported between instances of 'int' and 'NoneType'
> /Users/thomasrialan/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/motor_imagery.py(377)used_events()
    375                 if len(out) == self.n_classes:
    376                     break
--> 377         if len(out) < self.n_classes:
    378             raise (
    379                 ValueError(

So in short my question is: how can I get get left vs right hand accuracy for the CSP+LDA pipeline on the PhysionetMI dataset?

PierreGtch commented 2 months ago

For the left vs. right hand results, you should use:

from moabb.paradigms import LeftRightImagery
paradigm = LeftRightImagery()
trialan commented 2 months ago

Right, I wondered if that may be it, but when I run this script

from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

import moabb
from moabb.datasets import BNCI2014_001, PhysionetMI
from moabb.evaluations import CrossSessionEvaluation, WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery, MotorImagery

moabb.set_log_level("error")

pipelines = {}

pipelines["CSP+LDA"] = make_pipeline(CSP(n_components=30), LDA())

dataset = PhysionetMI()
paradigm = LeftRightImagery()
datasets = [dataset]
overwrite = True  # overwrite cached results
evaluation = WithinSessionEvaluation(
    paradigm=paradigm, datasets=datasets, suffix="examples", overwrite=overwrite
)

results = evaluation.process(pipelines)
print(results.groupby('pipeline').score.mean())

The output is

CSP+LDA    0.566848
Name: score, dtype: float32

Which does not correspond to the 65.74% reported on paperswithcode (this number is exactly the second row of table D2 to quote this number). To be fair the 56.6% is only ~0.5 standard errors (SE=17.37 in table D2 for CSP+LDA) away from the quoted number in the paper. So is this expected behaviour?

I ran it a few times and got these scores: 56.67, 57.09, 57.49. Seems pretty clear that the standard error is not 17.37% so I wonder what is going on here.

I also don't think it's about the number of CSP components as suggested by this paper and my experiments (I'm double checking this is true in moabb too, but I think it will be).

EDIT: Running with n=5 CSP components actually gave closer numbers to that quoted in table D2. I got: 64.74, 64.71, 65.70. Is it fair to assume that the results in D2 were obtained by doing some sort of parameter search and keeping the best? Sort of like what's described in appendix A (table A1)?

Maybe I nee to run evaluation.process instead or something?

PierreGtch commented 2 months ago

You can find the implementation details of the pipelines we used for the benchmark in the pipelines folder. The CSP + LDA one is here: https://github.com/NeuroTechX/moabb/blob/develop/pipelines/CSP.yml