Improve your readme and include data description, links (article, contact, etc), etc;
Add interpretation/observation to your plots;
Use sklearn encoder to encode the features;
Please remove the useless viz of cell #44;
Use your numerical imputer for all the numerical features at once instead of using a loop, watch thinkific tuto, split your dataset before to impute it, during EDA do not impute missing values, ignore them;
Improve your hypothesis testing;
Balance the dataset after splitting it, balance just the train subset obtained after splitting;
Cell #103 check the variable that you append
# Add the results to the dataframe
results_tuned_df = results_tuned_df.append({'Model': 'Random Forest (Tuned)',
'Accuracy': accuracy_rf,
'Precision': precision_rf,
'F1 Score': f1_rf}, ignore_index=True)
results_tuned_df = results_tuned_df.append({'Model': 'LightGBM (Tuned)',
'Accuracy': accuracy_rf, # have a look
'Precision': precision_rf, # have a look
'F1 Score': f1_rf}, ignore_index=True)
...
Don't export smote;
You may export lists of features by type.
Interesting work done, share this review with your team.
Improve your readme and include data description, links (article, contact, etc), etc; Add interpretation/observation to your plots; Use sklearn encoder to encode the features; Please remove the useless viz of cell #44; Use your numerical imputer for all the numerical features at once instead of using a loop, watch thinkific tuto, split your dataset before to impute it, during EDA do not impute missing values, ignore them; Improve your hypothesis testing; Balance the dataset after splitting it, balance just the train subset obtained after splitting; Cell #103 check the variable that you append
Don't export smote; You may export lists of features by type.
Interesting work done, share this review with your team.