Employee Attrition Prediction Project
Overview
The Employee Attrition Prediction Project aims at the in-depth analysis and prediction of employee attrition within an organizational context. Employee attrition, characterized by the voluntary departure of employees from a company, carries substantial implications for workforce management and organizational stability.
Dataset
Description
The dataset employed in this project comprises a wealth of information pertaining to employees within a hypothetical organizational setting. It encompasses a diverse range of attributes, including demographic details, job satisfaction indicators, job roles, and more. Of paramount significance is the target variable, "Attrition," which classifies employees as having either left the organization (Yes) or remained (No).
Metadata
The dataset metadata encompasses a comprehensive set of attributes:
- Age: Denotes the age of the employee.
- BusinessTravel: Represents the frequency of business travel.
- DailyRate: Reflects the daily rate of pay.
- Department: Identifies the department within the company where the employee is stationed.
- DistanceFromHome: Signifies the distance between the employee's place of residence and the workplace.
- Education: Indicates the level of education attained by the employee.
- EducationField: Specifies the field of education pursued by the employee.
- EmployeeCount: Records the count of employees, which remains largely constant.
- EnvironmentSatisfaction: Quantifies the degree of satisfaction with the work environment.
- Gender: Defines the gender of the employee.
- HourlyRate: Quantifies the hourly rate of pay.
- JobInvolvement: Measures the level of job involvement.
- JobLevel: Classifies the employee's job into hierarchical levels.
- JobRole: Designates the specific role the employee plays within the organization.
- JobSatisfaction: Gauges the employee's satisfaction with their job.
- MaritalStatus: Captures the marital status of the employee.
- MonthlyIncome: Represents the monthly income of the employee.
- MonthlyRate: Reflects the monthly rate of pay.
- NumCompaniesWorked: Enumerates the number of previous companies where the employee has been employed.
- Over18: Confirms whether the employee is of legal age (18 or older), which remains largely constant.
- OverTime: Indicates whether the employee works overtime.
- PercentSalaryHike: Quantifies the percentage increase in salary.
- PerformanceRating: Assigns a performance rating to the employee.
- RelationshipSatisfaction: Measures the level of satisfaction with work relationships.
- StandardHours: Specifies the standard working hours, which remain constant.
- StockOptionLevel: Identifies the level of stock options held by the employee.
- TotalWorkingYears: Summarizes the total number of years of working experience.
- TrainingTimesLastYear: Counts the number of training instances undergone by the employee in the previous year.
- WorkLifeBalance: Evaluates the employee's satisfaction with their work-life balance.
- YearsAtCompany: Represents the number of years the employee has spent at the current company.
- YearsInCurrentRole: Reflects the number of years spent by the employee in their current role.
- YearsSinceLastPromotion: Signifies the number of years elapsed since the employee's last promotion.
- YearsWithCurrManager: Denotes the number of years under the current managerial supervision.
- Attrition: The target variable categorizes employee attrition as either "Yes" (departure) or "No" (retention).
Project Details
Exploratory Data Analysis (EDA)
In the first part of this project, we do a deep dive into the dataset to understand it better. We look at things like how the data is spread out, how different parts of the data relate to each other, and any patterns that might be popping up. The main goal here is to figure out what things affect attrition in this company.
Machine Learning Models
In this project, we'll be using different machine learning techniques. These algorithms include a diverse range of approaches, including Support Vector Machine (SVM), Gradient Boosting, Random Forest, AdaBoost, Gaussian Naive Bayes, Logistic Regression, K-Nearest Neighbors, Extra Tree Classifier, and Decision Tree.
Requirements
To effectively work on and contribute to this project, ensure that you meet the following requirements:
Software and Tools
-
Python: This project requires Python 3.6 or higher. You can download and install Python from the official website here.
-
Jupyter Notebook: Jupyter Notebook is used for interactive data analysis and visualization. You can install it using pip:
-
Libraries: To run the code and notebooks in this project, you'll need to install the following Python libraries. You can install them using pip:
pandas: Data manipulation and analysis library.
pip install pandas
numpy: Scientific computing library for numerical operations.
pip install numpy
scikit-learn: Machine learning library for model development and evaluation.
pip install scikit-learn
matplotlib: Data visualization library for creating charts and plots.
pip install matplotlib
seaborn: Data visualization library for enhanced aesthetics and statistical graphics.
pip install seaborn