Open hustmonk opened 10 years ago
It's a good job.
When I read the code, I find some questions as follows:
(1) self.initial_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df = 3 / len(input_text), max_df=.4) need to change 3 to 3., or min_df = 0
(2) For every term, you compute fisher_exact, and choose the terms with the higher pval. maybe you should choose the lower pval.
Sorry for taking so long to reply! Must have missed this issue.
You're right on both counts -- thanks for noticing. I'll fix soon.
It's a good job.
When I read the code, I find some questions as follows:
(1) self.initial_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df = 3 / len(input_text), max_df=.4) need to change 3 to 3., or min_df = 0
(2) For every term, you compute fisher_exact, and choose the terms with the higher pval. maybe you should choose the lower pval.