Open DavidArenburg opened 7 years ago
Hi. I've done this couple of days ago - see #2 . So now partial_fit
method contains additional argument for weights. I've tried it myself and seems it works pretty well.
Great! Can you also update the docs and add an example of how to generate and use the weights? Thanks
Idea is to set weights of minor class inverse proportional to major class. For example you have dataset with 1000 examples 10 of which are positive and 990 are negative. I this case generally good idea is to set weight 1 for positive examples and ~0.01 (10/990) to negative examples.
Yeah, I get that, I just wanted to see an actual code implementation example in the docs
Let's keep it open as reminder to update docs.
First of all, thanks for the great effort- it looks great. The combination of
sparseMatrix
with Rcpp (instead of Rs memory expensivemodel.matrix
) looks very promising!Though, as many times mentioned in the paper, in real world we are facing with very sparse data and very small amount of successes, hence, the data is very unbalanced. The normal logistic regression implementation can't handle this (although generating very high accuracy, no TPs will be found), hence, it is crucial to re-balance the data using some type of weights.
In section 4.6 in the paper, they introduced a pretty straight forward implementation of subsampling correction.