bcbi / ClassImbalance.jl

Sampling-based methods for correcting for class imbalance in two-category classification problems
Other
11 stars 9 forks source link

Question - Is there any benchmarking `ClassImbalance.jl`'s ability to handle datasets of different sizes? #69

Open 00krishna opened 4 years ago

00krishna commented 4 years ago

Describe the bug

This is just a question about Classimbalance.jl's ability to handle different size datasets. I was working with the python imbalance-learn package, and it keeps crashing when I give it a dataset of more than 2-3 million rows. In the case of imbalanced data, this is to be expected since it takes so many false examples to get a positive one. I can find creative ways to "thin" the dataset, but I was just wondering if there were any tests on how the julia package handles larger datasets?

Thanks.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Screenshots

Desktop (please complete the following information):

Smartphone (please complete the following information):

NA Additional context

DilumAluthge commented 4 years ago

We currently don't have any benchmarks, but a pull request to add some benchmarks would be welcome!

I think there is room to improve the performance of this package.