Train a linear SVM on the channel power feature at alpha

yacineMahdid commented 4 years ago

We know that at alpha we should have some kind of EEG effect picked up by the channel on the temporal lobes. A good first step would be to use a simple linear SVM and to train it on each channel power at alpha and look at the performance (accuracy for now).

yacineMahdid commented 4 years ago

I was able to create my data-frame for channel-wise power at alpha. The missing values are codified as NaN values. To complete this task I will use the previous jupyter notebook as a template and improve it.

yacineMahdid commented 4 years ago

I've got some marginal improvement on what we had before by just having more features!

Healthy : accuracy = 57.63 %
MSK : accuracy = 54.784 %
Both : accuracy = 56.18 %

However there are a lot of thing we have to fix up here before calling it a day:

[x] What is the number of participant in each class : 13 healthy and 54 msk
[x] What is the number of window for baseline and hot: ~1.14% baseline/hot
[x] Would normalizing the data help the classification? (Hint: Yes it does) It helped marginally, but something seems to be up with some participant data, I should visualize each of them to make sure everything is fine. For instance participant healthy 7 has 0% accuracy!!
Healthy : accuracy = 48.72 %
MSK : accuracy = 55.20 %
Both : accuracy = 54.88 %

We see a slight boost in accuracy for MSK, but a significant drop for Healthy. However there are a few of the participant which were way below the random line. We should investigate the feature of these participant before and after soft normalization.

I found that there was a weird column at the end, might have been added there in the processing of the data with MATLAB. Still not sure why it's doing this but I found out more information over here on StackOverflow. This might affect the classification because of the scaler.

Update I think I found the issue, there is no hot for this participant: HE007. When the classifier is trying to classify the participant it output only 1 whereas it should have outputed 0 instead. This is how we can get to 0% accuracy. I've checked in the recording and there was absolutely nothing in HE007 hot1.set file, this should be removed.

Healthy: accuracy = 52.84 %
MSK: accuracy = 55.33 %
Both: accuracy = 55.15 %

This problem we just encountered tells me we should have the proportion of baseline vs hot window per participants and check if any of them are highly unbalanced.

[x] What is the proportion of baseline and hot windows for each participants? It's highly skewed for some participant and not for others.
[X] Would taking into consideration the proportion of healthy and hot help in the classification? As a group the MSK or Healthy have an almost equal proportion of Hot and Healthy. It's just that some participant have way less or way more of one or the other. Not really much we can do here.
[x] Would adding more fine-grained window help the classification? (It should as we have more data) We increased the windowing up to a sliding window of size 5s. We got the following:
Healthy: accuracy = 54.65 % (+2%)
MSK: accuracy = 57.65% (+2%)
Both: accuracy = 57.87% (+2%)

Increasing the windowing up to a sliding window of size 1s. We got the following:

Healthy: accuracy = 56.51% (+3%)
MSK: accuracy = 56.80 (+1.5%)

WATCH OUT

Doing a K fold cross validation will artificially inflate the performance of the classifier. If we do a 10 fold cross validation on the the healthy data with a sliding window of 0.1 seconds we get: 67.73% accuracy which is 12% more than what we had before! However, this is a kind of data leakage where the classifier is memorizing a particular participant instead of learning the task. This is what is happening in the Cold paper "Diverse frequency band-based convolutional neural networks for tonic cold pain assessment using EEG" where they do a 10 fold cross validation without segregating by groups.

yacineMahdid commented 4 years ago

Here I should have done multiple issues instead of cramming everything in this notebook, I'll continue with the script/notebook and for next issue I'll version it a better.

BIAPT / eeg-pain-detection

Train a linear SVM on the channel power feature at alpha #14