Closed jacyanthis closed 3 years ago
You don't need to test for significance - by running standard n-gram code (either using gensim or nltk), i.e by running bi-grams sequentially you can create different n-grams. Your job is to identify these higher order n-grams in your corpus. Does that help?
Yep! It seems straightforward without that requirement. Thanks.
Hello, for HW2 it asks, "Construct cells immediately below this that identify statistically significant bigrams, trigrams, quadgrams, higher-order ngrams and skipgrams." For the statistical significance of higher-order ngrams (e.g. pentagrams), nltk doesn't seem to have these built in and the example doesn't show an alternative method. Did you want us to write statistical significance testing code ourselves?