questions from vectorizer.py

VikParuchuri / scan

Score essays automatically with an easy web interface.

GNU Affero General Public License v3.0

41 stars 13 forks source link

Open hustmonk opened 10 years ago

hustmonk commented 10 years ago

It's a good job.

When I read the code, I find some questions as follows:

(1) self.initial_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df = 3 / len(input_text), max_df=.4) need to change 3 to 3., or min_df = 0

(2) For every term, you compute fisher_exact, and choose the terms with the higher pval. maybe you should choose the lower pval.

VikParuchuri commented 9 years ago

Sorry for taking so long to reply! Must have missed this issue.

You're right on both counts -- thanks for noticing. I'll fix soon.