First, the analysis is clear and easy to understand.
Then, I have a couple of suggestions regarding the report.
For "Table 1. Cross validate scores of model performance on train data.", you interpreted the process of choosing a model. However, it is not necessary for the final report. It is ok only providing the best model.
For the model performance table, I feel that you gave a lot of information but not enough interpretation. If you can add more interpretation on the metric (f1, recall, etc.), it would be more helpful to the reader, especially for people without a data background. You can also make a tweak on the performance table to make it simpler.
For Figures 1 and 2, you use the encode 0 and 1 as x labels. It would be more clear to relabel to >50k and <50k.
Last, for the team communication, I see most PR is merged by the person who created. It is better to review by other team members.
@huan-ds Thank you so much for the valuable feedback.
@fei-chang @yhchen20 I added basic definitions for precision and recall based on the confusion matrix in the report.
Hi,
First, the analysis is clear and easy to understand.
Then, I have a couple of suggestions regarding the report.
For "Table 1. Cross validate scores of model performance on train data.", you interpreted the process of choosing a model. However, it is not necessary for the final report. It is ok only providing the best model.
For the model performance table, I feel that you gave a lot of information but not enough interpretation. If you can add more interpretation on the metric (f1, recall, etc.), it would be more helpful to the reader, especially for people without a data background. You can also make a tweak on the performance table to make it simpler.
For Figures 1 and 2, you use the encode 0 and 1 as x labels. It would be more clear to relabel to >50k and <50k.
Last, for the team communication, I see most PR is merged by the person who created. It is better to review by other team members.
Thank you!