Open toncho11 opened 2 years ago
By checking here
You have CrossSubjectEvaluation() that is similar to my proposed WithinDatasetEvaluation() in the sense that it trains on the entire dataset except on one subject, but definitely not the same.
I think CrossSessionEvaluation() is not the same as my proposed WithinSubjectEvaluation().
I don't really understand the differences yet, can you maybe clarify? It looks like CrossSubjectEvaluation() is doing what you want for WithinSubject -- train on all except K subjects, evaluate on K. Your WithinDatasetEvaluation sounds like pooling all the data regardless of session and subject and doing k-fold CV on it, is that right?
Yes, for the WithinDatasetEvaluation. I want a model on the entire dataset, so that I could use it with new unseen subjects in the future.
For me WithinSessionEvaluation() calculates a score for each session, but I would like to calculate a score on merging all sessions of a single user and then calculating the score.
Hi, for now we concentrate on global benchmarking and transfer learning. Those are interesting suggestions for next steps.
I have several interrogations about those evaluations:
What is the interest of gathering all subjects/sessions before splitting them in training/validation sets? As you indicated this will yield variable results, so you need to make multiple evaluation and average them. I think the variability will come mostly from the subject and the session.
I have several interrogations about those evaluations:
- WithinSubjectEvaluation mixes all session of a subject. This is not desirable, as there are important differences between sessions for the same subject. Mixing them in the training dataset will result in very optimistic results (could be considered as leaking).
- WithinDatasetEvaluation is very close to CrossSubjectEvaluation. Mixing subjects in the training dataset will also results in too optimistic results as there are important differences between subjects. It could also be seen as leaking test data, when compared to realistic where not data of the user is available. To evaluate the impact of few data available for a subject, we have implemented learning curves to make systematic benchmark.
Thanks for @sylvchev's comments. I agree with you. But do you have some references to support your comments? Some papers may not care about this and result in a good performance.
I would like to do CrossSubjectEvaluation but using data from many datasets together.
datasets = [BNCI2014008(), BNCI2014009(),BNCI2015003()]
This makes 28 subjects (10 + 10 + 8). So each time I would like to test 1 using 27 (train). Currently I think this is not the case. The cross subject will be performed within each of the 3 datasets meaning 1 vs 7, 1 vs 7, 1 vs 9. Is this correct?
I think these evaluation methods are much needed. You do not have them now, right?
WithinSubjectEvaluation() - evaluates the performance on all sessions for the same subject WithinDatasetEvaluation() - it shuffles the data from all subjects (and sessions) and then it selects 1/5 for validation and the rest for training. Both training and validation will include data from all subjects. Results here will be more variable so it should be run several times as in cross validation.