UChicago-CCA-2021 / Frequently-Asked-Questions

Repository to ask questions - please use the issues page to ask your questions.
0 stars 0 forks source link

HW2 Question: pentagrams #21

Closed jacyanthis closed 3 years ago

jacyanthis commented 3 years ago

Hello, for HW2 it asks, "Construct cells immediately below this that identify statistically significant bigrams, trigrams, quadgrams, higher-order ngrams and skipgrams." For the statistical significance of higher-order ngrams (e.g. pentagrams), nltk doesn't seem to have these built in and the example doesn't show an alternative method. Did you want us to write statistical significance testing code ourselves?

bhargavvader commented 3 years ago

You don't need to test for significance - by running standard n-gram code (either using gensim or nltk), i.e by running bi-grams sequentially you can create different n-grams. Your job is to identify these higher order n-grams in your corpus. Does that help?

jacyanthis commented 3 years ago

Yep! It seems straightforward without that requirement. Thanks.