JanKosgei / Travel-Insurance-Prediction

This project aims to predict the propensity of travellers to procure insurance based of different factors such as age, annual income, family size etc.
MIT License
0 stars 0 forks source link

Machine Learning Project Review - Travel insurance prediction #1

Open okothchristopher opened 1 year ago

okothchristopher commented 1 year ago
  1. You need to create 3 repositories i.e
okothchristopher commented 1 year ago
  1. The Readme here is incomplete. Your readme needs to have the following sections
    • Title and Description (What the project is about and what it seeks to solve, i.e the objectives)
    • Data Source
    • Insights about the data (Key ones only, accompanied with some charts)
    • The product, what was the ML development steps and the outcomes to the same (output can be a graph eg a confusion matrix plot, with explanations below it)
    • Acknowledgments
okothchristopher commented 1 year ago
  1. Notebook Structure is not okay. You need to have a structured way of modelling, ie,
    • Give background (Done)
    • Data Context (When was the data collected, from where have you obtained it)
    • Problem Statement (What are you trying to predict)
okothchristopher commented 1 year ago
  1. EDA Steps
    • When doing EDA, I would advice against splitting the data first before exploring it, as some features would need to be explored in the context of the independent variable y
    • What is the full meaning of this "OOPs for plots"
    • For each plot you have to note what you have observed
    • This step 'Summary statistics for each numerical feature' can easily be done by the .info() method
    • Are your features linearly separable, that should be intention of this plot : sns.pairplot(Clients,vars=['Age', 'AnnualIncome','FamilyMembers'], hue='TravelInsurance')
    • This step (Converting the categorial features into binary) can easily be done by an encoder - use that.
okothchristopher commented 1 year ago
  1. Feature Engineering
    • There are no derived features in your dataset, which would help with improving the prediction.
okothchristopher commented 1 year ago
  1. Data pre-processing
    • You need to have a section title for this.
    • There is no need for this "To check if the spread of splits"
okothchristopher commented 1 year ago
  1. Modelling
    • You data is so small relatively speaking, thus instead of going with hold out sets, I would advice that you do cross val scores which give a realistic value for the model performance
okothchristopher commented 1 year ago
  1. Model evaluation
    • There is no tuning of parameters that you have done here, so the section " Defining models while tuning their hyperparameters" is misleading
    • This model evaluation metric selected is not the best, you need to use a more balance metric eg F1_score
    • Your model performance on train data is wanting, remember, I mentioned you need to aim for 90% and above, given this work has been done in many platforms.
okothchristopher commented 1 year ago
  1. You need to post the score you got on Kaggle.