Codecademy / docs

Codecademy Docs is a collection of information for all things code. 📕
https://www.codecademy.com/resources/docs
761 stars 3.44k forks source link

ML: Imballanced dataset #5140

Open SudoQui opened 2 weeks ago

SudoQui commented 2 weeks ago

Type of Edit (select all that apply)

Add new content (definitions, codeblocks, etc.)

Description (optional)

Imbalanced dataset is relevant primarily in the context of supervised machine learning involving two or more classes. Imbalance means that the number of data points available for different the classes is different: If there are two classes, then balanced data would mean 50% points for each of the class.

This document could really help how to finetune your model, from the dataset preparation stage. i will talk about the computer science and math's behind the problem behind the Imbalanced data, the solutions to it, such as: Data equalization, minority class increasing, algorithmic way, N Sample methods, generative adversarial networks and data augmentation.

Code of Conduct

For Maintainers

SudoQui commented 2 weeks ago

Can i be assigned to this tasK?

SaviDahegaonkar commented 2 weeks ago

Hey @SudoQui, You’re assigned 🎉 In addition to the documents linked in the description, please also look at the Contribution Guide. After creating a PR, the maintainer(s) (with the collaborator label) will add comments/suggestions to address any revisions before approval.

Is this your first contribution to Codecademy Docs? If so, we’re curious to know how you found out about contributing to Docs.

Thanks & Regards, Savi

SudoQui commented 2 weeks ago

Love you @SaviDahegaonkar