Closed trialan closed 3 months ago
Hi @trialan, could you send the complete script you tried to execute which resulted in this error?
Thanks Pierre, I've made a bit of progress on this (it actually was that I was missing some data). Now the results of this script:
from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import moabb
from moabb.datasets import BNCI2014_001, PhysionetMI
from moabb.evaluations import CrossSessionEvaluation, WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery, MotorImagery
moabb.set_log_level("error")
pipelines = {}
pipelines["CSP+LDA"] = make_pipeline(CSP(n_components=30), LDA())
dataset = PhysionetMI()
paradigm = MotorImagery()
datasets = [dataset]
overwrite = True # overwrite cached results
evaluation = WithinSessionEvaluation(
paradigm=paradigm, datasets=datasets, suffix="examples", overwrite=overwrite
)
results = evaluation.process(pipelines)
print(results.groupby('pipeline').score.mean())
are consistent with the reported results in Table D1 of the moabb paper.
However I wonder how to get a score for only some categories, as table D1 is for all categories, but the paperswithcode leaderboard is only left vs right hand. So I try to run this script instead:
#every other line the same
paradigm = MotorImagery(events=["left_hand", "right_hand"])
And see this error:
(eegenv) ➜ eeg git:(main) ✗ python -m IPython -i benchmarking.py
Python 3.12.2 (main, Feb 6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/Documents/code/eeg/benchmarking.py:28
23 overwrite = True # overwrite cached results
---> 28 results = evaluation.process(pipelines)
29 print(results.groupby('pipeline').score.mean())
33 #N=30, CSP+LDA (all classes) 0.565219
File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/evaluations/base.py:187, in BaseEvaluation.process(self, pipelines, param_grid, postprocess_
pipeline)
185 for dataset in self.datasets:
186 log.info("Processing dataset: {}".format(dataset.code))
--> 187 process_pipeline = self.paradigm.make_process_pipelines(
188 dataset,
189 return_epochs=self.return_epochs,
190 return_raws=self.return_raws,
191 postprocess_pipeline=postprocess_pipeline,
192 )[0]
193 # (we only keep the pipeline for the first frequency band, better ideas?)
195 results = self.evaluate(
196 dataset,
197 pipelines,
(...)
200 postprocess_pipeline=postprocess_pipeline,
201 )
File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:140, in BaseProcessing.make_process_pipelines(self, dataset, return_epochs
, return_raws, postprocess_pipeline)
137 self.prepare_process(dataset)
139 raw_pipelines = self._get_raw_pipelines()
--> 140 epochs_pipeline = self._get_epochs_pipeline(return_epochs, return_raws, dataset)
141 array_pipeline = self._get_array_pipeline(
142 return_epochs, return_raws, dataset, postprocess_pipeline
143 )
145 if array_pipeline is not None:
File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:389, in BaseProcessing._get_epochs_pipeline(self, return_epochs, return_ra
ws, dataset)
380 bmax = tmax
381 steps = []
382 steps.append(
383 (
384 "epoching",
385 make_pipeline(
386 ForkPipelines(
387 [
388 ("raw", make_pipeline(None)),
--> 389 ("events", self._get_events_pipeline(dataset)),
390 ]
391 ),
392 RawToEpochs(
393 event_id=self.used_events(dataset),
394 tmin=bmin,
395 tmax=bmax,
396 baseline=baseline,
397 channels=self.channels,
398 interpolate_missing_channels=self.interpolate_missing_channels,
399 ),
400 ),
401 )
402 )
403 if bmin < tmin or bmax > tmax:
404 steps.append(("crop", get_crop_pipeline(tmin=tmin, tmax=tmax)))
File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/base.py:538, in BaseParadigm._get_events_pipeline(self, dataset)
537 def _get_events_pipeline(self, dataset):
--> 538 event_id = self.used_events(dataset)
539 return RawToEvents(event_id=event_id, interval=dataset.interval)
File ~/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/motor_imagery.py:377, in MotorImagery.used_events(self, dataset)
375 if len(out) == self.n_classes:
376 break
--> 377 if len(out) < self.n_classes:
378 raise (
379 ValueError(
380 f"Dataset {dataset.code} did not have enough "
381 f"events in {self.events} to run analysis"
382 )
383 )
384 return out
TypeError: '<' not supported between instances of 'int' and 'NoneType'
> /Users/thomasrialan/Documents/code/venvs/eegenv/lib/python3.12/site-packages/moabb/paradigms/motor_imagery.py(377)used_events()
375 if len(out) == self.n_classes:
376 break
--> 377 if len(out) < self.n_classes:
378 raise (
379 ValueError(
So in short my question is: how can I get get left vs right hand accuracy for the CSP+LDA pipeline on the PhysionetMI dataset?
For the left vs. right hand results, you should use:
from moabb.paradigms import LeftRightImagery
paradigm = LeftRightImagery()
Right, I wondered if that may be it, but when I run this script
from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import moabb
from moabb.datasets import BNCI2014_001, PhysionetMI
from moabb.evaluations import CrossSessionEvaluation, WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery, MotorImagery
moabb.set_log_level("error")
pipelines = {}
pipelines["CSP+LDA"] = make_pipeline(CSP(n_components=30), LDA())
dataset = PhysionetMI()
paradigm = LeftRightImagery()
datasets = [dataset]
overwrite = True # overwrite cached results
evaluation = WithinSessionEvaluation(
paradigm=paradigm, datasets=datasets, suffix="examples", overwrite=overwrite
)
results = evaluation.process(pipelines)
print(results.groupby('pipeline').score.mean())
The output is
CSP+LDA 0.566848
Name: score, dtype: float32
Which does not correspond to the 65.74% reported on paperswithcode (this number is exactly the second row of table D2 to quote this number). To be fair the 56.6% is only ~0.5 standard errors (SE=17.37 in table D2 for CSP+LDA) away from the quoted number in the paper. So is this expected behaviour?
I ran it a few times and got these scores: 56.67, 57.09, 57.49. Seems pretty clear that the standard error is not 17.37% so I wonder what is going on here.
I also don't think it's about the number of CSP components as suggested by this paper and my experiments (I'm double checking this is true in moabb too, but I think it will be).
EDIT: Running with n=5 CSP components actually gave closer numbers to that quoted in table D2. I got: 64.74, 64.71, 65.70. Is it fair to assume that the results in D2 were obtained by doing some sort of parameter search and keeping the best? Sort of like what's described in appendix A (table A1)?
Maybe I nee to run evaluation.process
instead or something?
You can find the implementation details of the pipelines we used for the benchmark in the pipelines
folder. The CSP + LDA
one is here: https://github.com/NeuroTechX/moabb/blob/develop/pipelines/CSP.yml
I'm switching to MOABB for better comparing my results with other papers, so thanks for this project!
Problem: Here we have the paperswithcode leaderboard for PhysionetMI left vs right hand classification. Is the code for these numbers public? I know the numbers are given in the moabb paper but that's not super useful for me.
It's easy to get numbers for, e.g., BNCI_2014_001, as in this moabb docs tutorial. But swapping out BNCI_2014_001 for PhysionetMI breaks this code:
I get this error:
I think this may have something to do with the fact that in PhysionetMI there is one subject who got sampled at a different frequency (I think it's 160Hz for every subject except one), as noted in this issue by @sylvchev .
So is the current status of
PhysionetMI
broken, or is there something obvious I'm missing? Somehow the numbers for the leaderboard were generated so I imagine it's the latter.I have downloaded all the data in the relevant mne folder so I don't think that's the issue.