VikParuchuri / scan

Score essays automatically with an easy web interface.
GNU Affero General Public License v3.0
41 stars 13 forks source link

questions from vectorizer.py #1

Open hustmonk opened 10 years ago

hustmonk commented 10 years ago

It's a good job.

When I read the code, I find some questions as follows:

(1) self.initial_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df = 3 / len(input_text), max_df=.4) need to change 3 to 3., or min_df = 0

(2) For every term, you compute fisher_exact, and choose the terms with the higher pval. maybe you should choose the lower pval.

VikParuchuri commented 9 years ago

Sorry for taking so long to reply! Must have missed this issue.

You're right on both counts -- thanks for noticing. I'll fix soon.