UBC-MDS / group_26

Group repository for DSCI 522, Group 26
MIT License
0 stars 7 forks source link

TA Feedback (milestone 2): Overfitting #42

Closed jbourak closed 3 years ago

jbourak commented 3 years ago

You mention in part III, 2. that "we might want to avoid overfitting to these words as we train our model" but I can not see where/if you have done this in your analysis. Be explicit about what you will do (or would like to do) about the issue you brought up here.

mmyz88 commented 3 years ago

Per this feedback, I think it was due to a disconnection between the report and the analysis. I sort of generated my own idea when I was writing the report and mentioned the overfitting part without verifying whether it was in our workflow.

However, this is a good point and could potentially improve our overfitting situation. This is also the same idea that Ella pointed out in her peer review for our project. We could simply add a "stop_words" argument in the countvectorizer to address this issue.

@Andrew-Tan @yzr1996 @arashshams

Andrew-Tan commented 3 years ago

We need to fix the hyperlink in the final report near:

For more details on the model selection process, see the model comparison report.

mmyz88 commented 3 years ago