I can see the time and effort you guys put into this project. Great job everyone.
Please see my feedback below:
Report:
There were some minor grammatical issues but otherwise, good job! I liked your heatmaps visualizations for the categorical features and wished we had done the same for our project. I like the generic pairwise comparison done in pandas profiling library as well, but sometimes we just want to focus on the categorical features like you've done here.
It would be interesting to get more business context into how it might improve revenue management and how recall vs precision would impact revenue (minor feedback). Also, perhaps even how was this prediction functionality be used by the business. Ex. call agents, overbookings, marketing campaigns etc.
In regards to class imbalance, did you consider any approaches to deal with this?
In Table 2, it may be a personal preference, but I may prefer a barchart visualization of by f1 score or some sorting done on the classifiers (ex. by validation_f1 score) as it would be easier to parase through.
In Figure 1, it might be difficult, but could we choose a different visualization than line charts? It's hard to see with the scaling on some of the charts.
Noticed that some of the final report's images are different than the ones in the results folder (ex. cat_vs_target.svg and num_vs_target.svg)
For the cat_vs_target image, you may want to make the Y-axis nominal instead of quantitative so you can see the differences between classes.
Technical:
It'd be nice if we had a yml file to create a conda environment with to run this project.
Now that you have a Makefile, you can update the Usage section in the README.
Hopefully the next Milestone with the DOCKER file will resolve this.
I only skimmed through your code, but it looks quite well done.
good job on the helper_function.py
I liked how you pickled the model chosen. Albeit older approach to deploying models, but easy to use!
Minor item but getdata.py doesn't support creating the output folder if it doesn't exist (I think this was part of the initial specs from the Milestone)
I can see the time and effort you guys put into this project. Great job everyone.
Please see my feedback below:
Report: There were some minor grammatical issues but otherwise, good job! I liked your heatmaps visualizations for the categorical features and wished we had done the same for our project. I like the generic pairwise comparison done in pandas profiling library as well, but sometimes we just want to focus on the categorical features like you've done here.
Technical: