gimseng / 99-ML-Learning-Projects

A list of 99 machine learning projects for anyone interested to learn from coding and building projects
MIT License
576 stars 174 forks source link

[EXE] Learning KNN supervised classification #94

Open gimseng opened 3 years ago

gimseng commented 3 years ago

Learning Goals

Learn kNN algorithm for supervised classifications. Preferably use the kNN package from scikit-learn.

Prerequisites

Some basic of kNN will be assumed. If scikit-learn is used, some basics of how to install scikit-learn library is assumed.

Data source/summary:

I'm agnostic about which dataset to use, so anything suggested from a textbook exercise/blog is good.

Sayoni26 commented 3 years ago

Hey! I would like to work on this. I was thinking about beginner friendly iris plant classification using KNN as the project, using iris dataset. Let me know about your thoughts!

gimseng commented 3 years ago

Hi @Sayoni26, please go ahead and implement the codes. Thanks for contributing ! Please read the contributing guidelines and other previous projects in this repo to understand the format and organization. Looking forward to your PR.

namankhurpia commented 3 years ago

hey @gimseng, can I try this code, I m new to machine learning but i can definitely do decision trees.

namankhurpia commented 3 years ago

Since comments and replies take a lot of time, I m making a PR, please check it and approve it ... I am a first timer here

Tam-Mari commented 1 year ago

Hello! I would love to make a contribution. Since I'm also still learning, I'd love to help fellow learners to understand KNN using simple explanations. Thanks!

AnuravModak commented 10 months ago

Hello @gimseng,

I'm enthusiastic about contributing to this task and assisting learners in comprehending the strategies for handling imbalanced datasets effectively. I am interested in creating an informative guide that covers various techniques to address class imbalance in datasets, spanning from simple approaches like resampling to more advanced methods like ensemble techniques and using specialized algorithms.

My plan is to develop a comprehensive tutorial that encompasses the following key aspects:

Introduction to Handling Imbalanced Datasets: Providing an overview of why dealing with class imbalance is crucial in machine learning and the potential challenges it poses.

Resampling Techniques: Explaining the concept of resampling, including both oversampling (e.g., SMOTE) and undersampling and oversampling approaches and when and how to use them. I'll provide practical code examples to demonstrate how to implement these techniques using popular libraries.

Cost-Sensitive Learning: Discussing the concept of cost-sensitive learning and how it can be used to assign different misclassification costs to different classes. I'll include code examples to illustrate its implementation.

Ensemble Techniques: Introducing ensemble methods as a way to improve classification performance on imbalanced datasets. I'll explain how techniques like Balanced Random Forest and EasyEnsemble work and provide code examples.

Using Specialized Algorithms: Highlighting algorithms specifically designed to handle imbalanced data, such as the Adaptive Synthetic Sampling (ADASYN) algorithm. I'll walk through how to use these algorithms and showcase their impact.

Comparative Analysis: Comparing the effects of different techniques on an imbalanced dataset, including their impact on model performance, precision, recall, and F1-score. Visualizations will be included to help learners understand these differences.

Discussion: Engaging in a discussion about the scenarios in which each technique is most suitable, considering the nature of the dataset, the algorithm, and the problem at hand.

For datasets, I'm considering:

Credit Card Fraud Detection Dataset: A widely-used imbalanced dataset, suitable for illustrating the application of various techniques.

Diabetes Classification Dataset: To showcase the handling of class imbalance in a medical context.

Online Retail Dataset: For demonstrating the impact of imbalanced datasets on a real-world e-commerce scenario.

I'm open to feedback and suggestions regarding this plan. My aim is to create a user-friendly and informative resource that equips learners with the knowledge and tools to tackle class imbalance effectively in their machine learning projects.

Best regards, Anurav Modak