WithinSubjectEvaluation() and WithinDatasetEvaluation()

NeuroTechX / moabb

Mother of All BCI Benchmarks

https://neurotechx.github.io/moabb/

BSD 3-Clause "New" or "Revised" License

705 stars 181 forks source link

WithinSubjectEvaluation() and WithinDatasetEvaluation() #305

Open toncho11 opened 2 years ago

toncho11 commented 2 years ago

I think these evaluation methods are much needed. You do not have them now, right?

WithinSubjectEvaluation() - evaluates the performance on all sessions for the same subject WithinDatasetEvaluation() - it shuffles the data from all subjects (and sessions) and then it selects 1/5 for validation and the rest for training. Both training and validation will include data from all subjects. Results here will be more variable so it should be run several times as in cross validation.

toncho11 commented 2 years ago

By checking here

You have CrossSubjectEvaluation() that is similar to my proposed WithinDatasetEvaluation() in the sense that it trains on the entire dataset except on one subject, but definitely not the same.

I think CrossSessionEvaluation() is not the same as my proposed WithinSubjectEvaluation().

vinay-jayaram commented 2 years ago

I don't really understand the differences yet, can you maybe clarify? It looks like CrossSubjectEvaluation() is doing what you want for WithinSubject -- train on all except K subjects, evaluate on K. Your WithinDatasetEvaluation sounds like pooling all the data regardless of session and subject and doing k-fold CV on it, is that right?

toncho11 commented 2 years ago

Yes, for the WithinDatasetEvaluation. I want a model on the entire dataset, so that I could use it with new unseen subjects in the future.

For me WithinSessionEvaluation() calculates a score for each session, but I would like to calculate a score on merging all sessions of a single user and then calculating the score.

sylvchev commented 1 year ago

Hi, for now we concentrate on global benchmarking and transfer learning. Those are interesting suggestions for next steps.

sylvchev commented 1 year ago

I have several interrogations about those evaluations:

WithinSubjectEvaluation mixes all session of a subject. This is not desirable, as there are important differences between sessions for the same subject. Mixing them in the training dataset will result in very optimistic results (could be considered as leaking).
WithinDatasetEvaluation is very close to CrossSubjectEvaluation. Mixing subjects in the training dataset will also results in too optimistic results as there are important differences between subjects. It could also be seen as leaking test data, when compared to realistic where not data of the user is available. To evaluate the impact of few data available for a subject, we have implemented learning curves to make systematic benchmark.

sylvchev commented 1 year ago

What is the interest of gathering all subjects/sessions before splitting them in training/validation sets? As you indicated this will yield variable results, so you need to make multiple evaluation and average them. I think the variability will come mostly from the subject and the session.

dawin2015 commented 9 months ago

I have several interrogations about those evaluations:

WithinSubjectEvaluation mixes all session of a subject. This is not desirable, as there are important differences between sessions for the same subject. Mixing them in the training dataset will result in very optimistic results (could be considered as leaking).

WithinDatasetEvaluation is very close to CrossSubjectEvaluation. Mixing subjects in the training dataset will also results in too optimistic results as there are important differences between subjects. It could also be seen as leaking test data, when compared to realistic where not data of the user is available. To evaluate the impact of few data available for a subject, we have implemented learning curves to make systematic benchmark.

Thanks for @sylvchev's comments. I agree with you. But do you have some references to support your comments? Some papers may not care about this and result in a good performance.

toncho11 commented 8 months ago

I would like to do CrossSubjectEvaluation but using data from many datasets together.

datasets = [BNCI2014008(), BNCI2014009(),BNCI2015003()]

This makes 28 subjects (10 + 10 + 8). So each time I would like to test 1 using 27 (train). Currently I think this is not the case. The cross subject will be performed within each of the 3 datasets meaning 1 vs 7, 1 vs 7, 1 vs 9. Is this correct?