NeuroTechX / moabb

Mother of All BCI Benchmarks
https://neurotechx.github.io/moabb/
BSD 3-Clause "New" or "Revised" License
677 stars 174 forks source link

WithinSessionEvaluation should behave differently for ERP-based paradigms #280

Open jsosulski opened 2 years ago

jsosulski commented 2 years ago

In many ERP-based datasets, the epochs aligned to a stimulus will overlap to a large extent, e.g. when considering a 0.5 second window after the stimulus, but the time between the onset of stimuli (SOA) was only 100ms. Consider the following example with 5 epochs, where T denotes train and V denotes validation.

onset T1   T2   V3   T4   T5

time   0  100  200  300  400

V3 is picked for validation purposes, however in 500ms epochs, the last 300ms of T1 contain the first 300ms of V3 and the first 400ms of T4 contain the last 400ms of V3, etc.

Note that this is usually not an issue in Motor Imagery / SSVEP as these epochs seldomly overlap.

In my fork of moabb I perform train / validation splits only on borders between runs (usually different EEG files and concluded short sections of recordings). However, some datasets do not have run-information, here we could either detect the borders based on event times, or if events only have sequence but no time information, just pick continuous sections of recording and omit X epochs at the border (according to SOA + epoch size). The TimeSeriesSplit of sklearn could be an alternative but here each training-fold has different sizes. In my opinion for ERP/P300 paradigms it is not useful to train a classifier in WithinSessionEvaluation on e.g. 80% of the data, as these results are mostly useless for practical BCI usage.

I dont know whether to create a new WithinSessionEvaluationChronological or change the default behaviour of WithinSessionEvaluation (In my own fork of moabb I do the latter).

jsosulski commented 2 years ago

See also #187

sylvchev commented 2 years ago

This question have also been raised during MOABB Minischool at Cortico's meeting for all paradigm: in BCI, the training is usually done on a calibration period and the rest of the data (used in online) could be used for evaluation.

I think the best course is to include a new evaluation, like CalibrationTestEvaluation that is using some part of the data as training and the rest for tests. In some dataset, there is a specific training session that we could reuse, in the other, we could use one third of the data.