imbalanced class labels in knn?

RfastOfficial / Rfast

A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast.

139 stars 19 forks source link

imbalanced class labels in knn? #55

Closed prRZ5F4LXZ closed 1 year ago

prRZ5F4LXZ commented 2 years ago

Is your feature request related to a problem? Please describe.

Regarding knn, when class labels are imbalanced, it may need some mitigation, such as weighting neighbors by the inverse of their class size. But I don't see options of Rfast's knn() for imbalance class problems.

https://stats.stackexchange.com/questions/341/knn-and-unbalanced-classes

How to address the imbalance problem with Rfast's knn()?

Describe the solution you'd like See above.

Describe alternatives you've considered NA.

Additional context NA.

statlink commented 2 years ago

Hello,

We do not do that because I had seen in some examples it did not improve the predictive accuracy. You might be right. I think we have this option in the dirknn() function, yet, it is not exported in R. Would you be interested in this? The difference is that the data are first normalised to have unit norm. Each vector is normalised, not each feature. We could export it in R. Please let me know.

Michail