Closed daxiongshu closed 9 years ago
Or, I guess that when we use mutli channels + past frames at the same time, logistic regression is not good. So maybe we can I just use them separately as they were
Interesting observations Carl. We shall train one model with each channel and then ensemble as you suggested. It looks like a very good idea :)
Yes. We need to concentrate more on the "HandStart" model then as you suggested. I will try to run models specifically for this one and let you know.
Hi, I've done more experiments. it seems that the idea of averaging different 'single channel' models can not improve my validation auc over 0.922. I have averaged 8 channels, which only improves 0.001 over averaging just two channels. All of them use my past frames features.
I'm currently fully reproducing your sub11 for validation to see how good our best normal base model so far can be. Yeah, I think we still need to implement the reinenforced base model and stacking to win this contest.
But it is fun to do so : )
Hi, I have updated another model for today's submission, stack3sub.csv, which is also legit. After I finish reproducing your sub11.csv, we could find out what is the best score we can get with the current best normal base model. And then I'll work on the reinforced base model.
Thanks Carl. Yes. We will be needing the reinforced base model and stacking to win.
I will run some experiments from my side and see if I could increase the auc of 'HandStart'. Or should I work on something else. Please let me know.
Hi, One thing to try is look at this code, https://github.com/MichaelHills/seizure-prediction, he said that he implements a 'linear regression model' that is the best. His code is really well structured but also hard for me to adapt to our problem. There is no stacking in his model. So we could find out what he used for feature engineering and how exactly that "linear regression" works, it will help our base model and "Handstart" of course.
I'll try to implement a simple and legit "reinforced stacking model" now : )
Thank you so much.
Sure Jiwei. I will have a look at this and try to improve ours based on that. Thank you.
Hi, I have a question, in your sub11.csv, the predictions of 5 channel groups, are averaged using equal weight, is it?
Hi Carl, yes they are averaged using equal weights. I will send you the exact code which I used in a couple of hours. Thank you.
Carl, I confirm that in sub 11, I have averaged all 10 submissions (5 from LDA and 5 from LR - one for each set of electrodes) with equal weightage.
Hi, here are some of my observations:
a) Feeding logistic regression with features from multiple channels doesn't necessarily outperform using only one channel. at least with my past frame features. I think logistic regression may not capture the feature interactions in this case.
I test two models. model1: using channels [1,3,5,6,9] -1 (due to memory constraint) and model2 just use channel [3] -1 Both use my get_fea() to generate some past frame features. validation, model1: 0.900 and model2: 0.909
b). A linear combinations of predictions of logistic regression model with different channel is more effective.
I grid searched best weight to ensemble model1 and model2 with my vali1cv (using channel 0) for each target respectively. after ensemble model1: 0.919 model2: 0.921
So I think I will just train each model with one channel, which is also faster, and average them after prediction.
c) For the stack1cv, the average validation auc is 0.931, specifically HandStart 0.894293600419 FirstDigitTouch 0.95531343746 BothStartLoadPhase 0.958854808557 LiftOff 0.933138798473 Replace 0.923223203131 BothReleased 0.923313623182 average 0.931356245203
we can see that handstart is much lower than the following events. This is also intuitive. Because handstart is the first action the subject is going to do so there is no meaningful previous action for stacking model to take advantage of. which also indicates that a good base model is the key to improve handstart.
Please let me know your thoughts.