Open sanika1201 opened 4 years ago
@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.
Some of your line lengths are way too long, my rule of thumb is <88 chars
Y_train_downsampled = X_train_downsampled.iloc[:,32]
looks suspect to me, can you explain?X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True)
to me?raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)
?I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree
@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.
@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?
- remove old code that is commented out
- when you index by [:, 32] in cell 4, what is that doing?
- I don't understand how you are doing the downsampling/resampling whatever. are you just grabbing time points at random?
- This line
Y_train_downsampled = X_train_downsampled.iloc[:,32]
looks suspect to me, can you explain?- Again, can you explain
X_train_downsampled.drop(X_train_downsampled.columns[[32]],axis=1,inplace = True)
to me?- I'd just do all imports at the beginning of the notebook
raw,y_raw,raw_t,y_rar_t = None,None,None,None print (raw)
?- can you plot some of the data? Maybe a few each positive and negative examples? It is hard for me to understand what is going on without it, and that might help understand what is going on for you too. May also want to consider doing so before and after your train test splitting as well as resampling so that you can make sure you are not messing anything up in that process
- looks like the precision plot is still not making sense if I am understanding correctly
- can you remind me what is the true class imbalance?
I think my main feedback is I want to better understand how you are splitting your data before debugging the downstream stuff too much. I am worried that may be part of the issue. I think to do that I would like to see some sample time series from each class, before and after all of your preprocessing. Let me know if that does not make sense or you don't agree
@bdpedigo I have made the changes we discussed and uploaded the latest code and plots to this PR.
@sanika1201 I don't understand why the one commit here is Jesse's, did you mean to PR the notebook somewhere? I know your situation is a bit special, however.
@bdpedigo , I meant to PR to NeuroDataDesign/SPORF, i dont know how the commit got included. Should make a different PR?
would rather you remove just that one commit, i don't like remaking PRs because you lose all of the comments
the notebook itself should be part of this PR, just FYI
I think we have talked about this already, but moving average filter is not what I meant by binning at all.
Binning for a single channel:
n
bins, each of width m
.n
by m
matrix, X
. Input X
as the training dataBinning for multichannel
X_1
... X_C
data matrices described aboveX_1
... X_C
to make X_big
, a n
by C
x m
matrix does that make sense? I want to make sure I am being clear. Though I think we may be out of time to actually do this right now, but I still want to make sure it is clear for the future.
Plots look good though, and I think make more sense than what you have shown in the past
I think we have talked about this already, but moving average filter is not what I meant by binning at all.
Binning for a single channel:
- divide single timeseries into
n
bins, each of widthm
.- stack those individual bins into a
n
bym
matrix,X
. InputX
as the training dataBinning for multichannel
- For each channel 1...C, form
X_1
...X_C
data matrices described above- concatenate columns of
X_1
...X_C
to makeX_big
, an
byC
xm
matrix
Yes, I understand this, and it makes more sense. Due to memory limitations, I decided to down-sample it to one value representing each bin, which was the mean. I went through a few recommendations on kaggle and this was one of the suggestions which gave decent results on Neural Network so i went ahead with this.
I see. in that case feels like we are mostly limited by compute power at this point?
I see. in that case feels like we are mostly limited by compute power at this point?
Yes. If we can get a little more compute power next semester, will try to get better results on this with the improvements you mentioned above.
plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!
plots are clear, and this should scale up nicely once we get you some actual compute resources, and at that point i think we will be able to actually compare results. I don't have much more to recommend right now so I think you are done. Nice work!
Thanks!
the notebook itself should be part of this PR, just FYI
@bdpedigo , I think the other commit got added to this pull request instead of my notebook. Should i just make another PR and link this PR there so that the comments are not lost?
Description Goal: Compare performance of S-Rerf with different classifiers on grasp detection using real EEG data.
This demo is a Jupyter Notebook documentation analyzing the performance of S-Rerf against classifiers like K-Nearest Neighbors, Random Forest and Multi-Layer Perceptron on structured EEG data. To keep the structure of the data, binning (based on the concept of moving average filter) is done before training on the data. The challenge faced is that the data is highly unbalanced so it is balanced before training. The metric used for evaluation are precision curves, balanced accuracy and mean test error.
Output: The precision, balanced accuracy and mean test error plots that compare performance of S-Rerf with different classifiers.
Code and Details of the demo: https://nbviewer.jupyter.org/github/NeuroDataDesign/team-forbidden-forest/blob/master/Sanika/Final_PR_upload.ipynb