Closed AjayKhalsa closed 3 years ago
Great idea @AjayKhalsa ! As an organization tips, I'd suggest having every contributor to fill in the overall readme.md
in this project folder (either in root or exercise
), and there should be either a agreed-upon format/table where we summarize the approach/model, their performances and etc. Also important is how everyone splits train-test should be standardized (say 80/20). Optionally, everyone should feel free to further split the 80 training data to train-validation as needed.
It'd be good for the first contributor to start having doing so and we can have a few iterations of this to make it not just a copy-and-pasted Jupiter notebooks area, but rather a carefully collated model comparison project folder.
Also, for the benefits of collaborations, perhaps we should have each contributor submitting 1-2 models. So its not just dominated by a few, but encouraging more people to participate in this model comparison exercise.
I like the idea. I would like to contribute.
@vishxm Sure, definitely welcomed ! @AjayKhalsa and @vishxm , do you know where to get the dataset?
It might be useful for the first PR to be creating the dataset folder and copy the dataset together with a readme.md
describing the dataset and the sources/credit.
gimme 10 mins i'll add the dataset and an initial structure
This issue is especially for Hacktoberfest participants
Learning Goals
How different algorithms give different results when implemented on a single dataset
Exercise Statement
[Explain and describe what the exercise is] Implement different ML Algorithms like Logistic Regression, Random Forest, XG Boost for Employee Attrition dataset
Prerequisites
Random-forest model, feature extraction, SVM, logistic Regression, etc
Data source/summary:
[Provide a succinct summary of what the data is and where it is from] To predict Employee Attrition by the given data about his/her past history. This dataset is a modified version of the IBM Employee Analytics Dataset
(Optional) Suggest/Propose Solutions
Implement different data preprocessing techniques, algorithms. Feel free to use your creativity. Add your solution with an explanation in comments with the filename as the name of the models used and a short description of your solution like what techniques you used and the model accuracy in the solution readme.