dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.55k stars 470 forks source link

Augmentation #475

Closed NormanTrinh closed 1 year ago

NormanTrinh commented 1 year ago

can you explain what the ClassificationSMOTE() class does?

Optimox commented 1 year ago

first result on Google : https://towardsdatascience.com/smote-synthetic-data-augmentation-for-tabular-data-1ce28090debc

Answer from GPT4:

SMOTE (Synthetic Minority Over-sampling Technique) is a popular data augmentation method used for handling class imbalance in tabular datasets, particularly for classification problems. It generates synthetic samples for the minority class to balance the class distribution, thus improving the performance of classification algorithms.

In a nutshell, SMOTE works as follows:

For each minority class sample, SMOTE selects k nearest neighbors from the same class (usually k=5).
A random number, alpha, is generated between 0 and 1.
A new synthetic sample is created by interpolating between the selected sample and one of its k nearest neighbors, using the random number alpha as a weight.
This process is repeated until the desired level of class balance is achieved.

SMOTE augmentation helps improve classifier performance by reducing the impact of class imbalance, which can cause models to be biased towards the majority class. However, it's important to be cautious with SMOTE, as over-sampling might lead to overfitting, and the synthetic samples generated might not always represent real-world data accurately.