Machine Learning Project Review - Travel insurance prediction

JanKosgei / Travel-Insurance-Prediction

This project aims to predict the propensity of travellers to procure insurance based of different factors such as age, annual income, family size etc.

MIT License

0 stars 0 forks source link

Machine Learning Project Review - Travel insurance prediction #1

Open okothchristopher opened 1 year ago

okothchristopher commented 1 year ago

You need to create 3 repositories i.e

Machine Learning Projects (Learning),
Machine Learning projects(Capstones)
Machine Learning projects (Personal Practice) Within each repository, you will need a folder for each of the projects, have a standard way of naming them i.e
1. Week 1 Boston Regression
2. Week 2 Credit Risk Classification etc. Within each week, have a readme of the project, detail what you did, what was the outcome and what were the key concepts/lessons learnt.

okothchristopher commented 1 year ago

The Readme here is incomplete. Your readme needs to have the following sections
- Title and Description (What the project is about and what it seeks to solve, i.e the objectives)
- Data Source
- Insights about the data (Key ones only, accompanied with some charts)
- The product, what was the ML development steps and the outcomes to the same (output can be a graph eg a confusion matrix plot, with explanations below it)
- Acknowledgments

okothchristopher commented 1 year ago

Notebook Structure is not okay. You need to have a structured way of modelling, ie,
- Give background (Done)
- Data Context (When was the data collected, from where have you obtained it)
- Problem Statement (What are you trying to predict)

okothchristopher commented 1 year ago

EDA Steps
- When doing EDA, I would advice against splitting the data first before exploring it, as some features would need to be explored in the context of the independent variable y
- What is the full meaning of this "OOPs for plots"
- For each plot you have to note what you have observed
- This step 'Summary statistics for each numerical feature' can easily be done by the .info() method
- Are your features linearly separable, that should be intention of this plot : sns.pairplot(Clients,vars=['Age', 'AnnualIncome','FamilyMembers'], hue='TravelInsurance')
- This step (Converting the categorial features into binary) can easily be done by an encoder - use that.

okothchristopher commented 1 year ago

Feature Engineering
- There are no derived features in your dataset, which would help with improving the prediction.

okothchristopher commented 1 year ago

Data pre-processing
- You need to have a section title for this.
- There is no need for this "To check if the spread of splits"

okothchristopher commented 1 year ago

Modelling
- You data is so small relatively speaking, thus instead of going with hold out sets, I would advice that you do cross val scores which give a realistic value for the model performance

okothchristopher commented 1 year ago

Model evaluation
- There is no tuning of parameters that you have done here, so the section " Defining models while tuning their hyperparameters" is misleading
- This model evaluation metric selected is not the best, you need to use a more balance metric eg F1_score
- Your model performance on train data is wanting, remember, I mentioned you need to aim for 90% and above, given this work has been done in many platforms.

okothchristopher commented 1 year ago

You need to post the score you got on Kaggle.