Demo on Comparison of performance of S-Rerf against other classifiers on Real EEG data for Grasp detection

sanika1201 commented 4 years ago

Description Goal: Compare performance of S-Rerf with different classifiers on grasp detection using real EEG data.

This demo is a Jupyter Notebook documentation analyzing the performance of S-Rerf against classifiers like K-Nearest Neighbors, Random Forest and Multi-Layer Perceptron on structured EEG data. To keep the structure of the data, binning (based on the concept of moving average filter) is done before training on the data. The challenge faced is that the data is highly unbalanced so it is balanced before training. The metric used for evaluation are precision curves, balanced accuracy and mean test error.

Output: The precision, balanced accuracy and mean test error plots that compare performance of S-Rerf with different classifiers.

Code and Details of the demo: https://nbviewer.jupyter.org/github/NeuroDataDesign/team-forbidden-forest/blob/master/Sanika/Final_PR_upload.ipynb

bdpedigo commented 4 years ago

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

bdpedigo commented 4 years ago

Some of your line lengths are way too long, my rule of thumb is <88 chars

bdpedigo commented 4 years ago

remove old code that is commented out
when you index by [:, 32] in cell 4, what is that doing?
I don't understand how you are doing the downsampling/resampling whatever. are you just grabbing time points at random?
This line Y_train_downsampled = X_train_downsampled.iloc[:,32] looks suspect to me, can you explain?
Again, can you explain X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True) to me?
I'd just do all imports at the beginning of the notebook
raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)?
can you plot some of the data? Maybe a few each positive and negative examples? It is hard for me to understand what is going on without it, and that might help understand what is going on for you too. May also want to consider doing so before and after your train test splitting as well as resampling so that you can make sure you are not messing anything up in that process
looks like the precision plot is still not making sense if I am understanding correctly
can you remind me what is the true class imbalance?

I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree

sanika1201 commented 4 years ago

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?

sanika1201 commented 4 years ago

remove old code that is commented out

when you index by [:, 32] in cell 4, what is that doing?

I don't understand how you are doing the downsampling/resampling whatever. are you just grabbing time points at random?

This line Y_train_downsampled = X_train_downsampled.iloc[:,32] looks suspect to me, can you explain?

Again, can you explain X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True) to me?

I'd just do all imports at the beginning of the notebook

raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)?

can you plot some of the data? Maybe a few each positive and negative examples? It is hard for me to understand what is going on without it, and that might help understand what is going on for you too. May also want to consider doing so before and after your train test splitting as well as resampling so that you can make sure you are not messing anything up in that process

looks like the precision plot is still not making sense if I am understanding correctly

can you remind me what is the true class imbalance?

I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree

@bdpedigo I have made the changes we discussed and uploaded the latest code and plots to this PR.

bdpedigo commented 4 years ago

@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.

@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?

would rather you remove just that one commit, i don't like remaking PRs because you lose all of the comments

bdpedigo commented 4 years ago

the notebook itself should be part of this PR, just FYI

bdpedigo commented 4 years ago

I think we have talked about this already, but moving average filter is not what I meant by binning at all.

Binning for a single channel:

divide single timeseries into n bins, each of width m.
stack those individual bins into a n by m matrix, X. Input X as the training data

Binning for multichannel

For each channel 1...C, form X_1 ... X_C data matrices described above
concatenate columns of X_1 ... X_C to make X_big, a n by C x m matrix

bdpedigo commented 4 years ago

does that make sense? I want to make sure I am being clear. Though I think we may be out of time to actually do this right now, but I still want to make sure it is clear for the future.

bdpedigo commented 4 years ago

Plots look good though, and I think make more sense than what you have shown in the past

sanika1201 commented 4 years ago

I think we have talked about this already, but moving average filter is not what I meant by binning at all.

Binning for a single channel:

divide single timeseries into n bins, each of width m.

stack those individual bins into a n by m matrix, X. Input X as the training data

Binning for multichannel

For each channel 1...C, form X_1 ... X_C data matrices described above

concatenate columns of X_1 ... X_C to make X_big, a n by C x m matrix

Yes, I understand this, and it makes more sense. Due to memory limitations, I decided to down-sample it to one value representing each bin, which was the mean. I went through a few recommendations on kaggle and this was one of the suggestions which gave decent results on Neural Network so i went ahead with this.

bdpedigo commented 4 years ago

I see. in that case feels like we are mostly limited by compute power at this point?

sanika1201 commented 4 years ago

I see. in that case feels like we are mostly limited by compute power at this point?

Yes. If we can get a little more compute power next semester, will try to get better results on this with the improvements you mentioned above.

bdpedigo commented 4 years ago

plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!

sanika1201 commented 4 years ago

plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!

Thanks!

sanika1201 commented 4 years ago

the notebook itself should be part of this PR, just FYI

@bdpedigo , I think the other commit got added to this pull request instead of my notebook. Should i just make another PR and link this PR there so that the comments are not lost?

NeuroDataDesign / SPORF

Demo on Comparison of performance of S-Rerf against other classifiers on Real EEG data for Grasp detection #5