Open williexu opened 8 years ago
Dear williexu: Thank you for your sincere advice.
Yes, you are all right.
The plots show how much variance of the proportion of Y=1 is due to feature Xj. In this way, we can find the feature Xj that has significant influence on the target Y. We plotted 16 plots for the 16 features. However, due to 3 page limit, we only chose 2 typical variables—one with significant variance, one with insignificant variance. Does that make sense?
However, since all the features Xj and target variable Y are categorical, we can hardly think of any plots other than bar plots of number or frequency within each category. Could you help us recommend other plots for this kind of data?
However, this mid-term project requires only preliminary analysis, so we only ran a baseline model and left more work on November. Otherwise, the project is close to be finished.
In addition, we found our original goal does not make sense on Oct. 28th, so we changed our goal and had very limited time to restart. Since Naive Bayes algorithm is one of the most simple algorithms for classification problem that directly use categorical features without one-hot encoding, we adopted it for the start. Certainly, we will adopt and compare other classification methods during November.
Thank you. Best, Ziyi
Your report was really clear and to the point. I like how there's a lot of future work that you listed to improve on your findings and how you guys included graphs to show visuals from your data analysis. With that however, I don't think the plots show all that much information about the question you guys are trying to answer; they only show the percentages of people with that feature.
I do believe there is a lot more variety of analysis that can be done aside from Naive Bayes. Did you guys try anything else? Running regression on your data might give some interesting results. Think about the advantages/disadvantages of both and see which models give you more accurate predictions.