gimseng / 99-ML-Learning-Projects

A list of 99 machine learning projects for anyone interested to learn from coding and building projects
MIT License
576 stars 174 forks source link

[IMP] Implement Basic ML Algorithms on a Employee Attrition Dataset #98

Closed AjayKhalsa closed 3 years ago

AjayKhalsa commented 3 years ago

This issue is especially for Hacktoberfest participants

Learning Goals

How different algorithms give different results when implemented on a single dataset

Exercise Statement

[Explain and describe what the exercise is] Implement different ML Algorithms like Logistic Regression, Random Forest, XG Boost for Employee Attrition dataset

Prerequisites

Random-forest model, feature extraction, SVM, logistic Regression, etc

Data source/summary:

[Provide a succinct summary of what the data is and where it is from] To predict Employee Attrition by the given data about his/her past history. This dataset is a modified version of the IBM Employee Analytics Dataset

(Optional) Suggest/Propose Solutions

Implement different data preprocessing techniques, algorithms. Feel free to use your creativity. Add your solution with an explanation in comments with the filename as the name of the models used and a short description of your solution like what techniques you used and the model accuracy in the solution readme.

gimseng commented 3 years ago

Great idea @AjayKhalsa ! As an organization tips, I'd suggest having every contributor to fill in the overall readme.md in this project folder (either in root or exercise), and there should be either a agreed-upon format/table where we summarize the approach/model, their performances and etc. Also important is how everyone splits train-test should be standardized (say 80/20). Optionally, everyone should feel free to further split the 80 training data to train-validation as needed.

It'd be good for the first contributor to start having doing so and we can have a few iterations of this to make it not just a copy-and-pasted Jupiter notebooks area, but rather a carefully collated model comparison project folder.

gimseng commented 3 years ago

Also, for the benefits of collaborations, perhaps we should have each contributor submitting 1-2 models. So its not just dominated by a few, but encouraging more people to participate in this model comparison exercise.

vishxm commented 3 years ago

I like the idea. I would like to contribute.

gimseng commented 3 years ago

@vishxm Sure, definitely welcomed ! @AjayKhalsa and @vishxm , do you know where to get the dataset? It might be useful for the first PR to be creating the dataset folder and copy the dataset together with a readme.md describing the dataset and the sources/credit.

AjayKhalsa commented 3 years ago

gimme 10 mins i'll add the dataset and an initial structure