chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

n-gram range in textacy.extract.ngrams #272

Closed mzeidhassan closed 5 years ago

mzeidhassan commented 5 years ago

I have a question about "textacy.extract.ngrams". I am trying to extract ngrams in the range of (2, 3, 4, 5) grams. I tried to pass the following, but it doesn't seem to work.

ngrams = textacy.extract.ngrams(doc, n=(2, 6), filter_stops=True, filter_nums=True, min_freq=2)

Is there a way to pass a range of n-grams? Or is there a better module that can do what I am looking for?

Please note that I am also using "exclude_pos", that's why I opted for 'extract.ngrams'.

Environment

Thanks

bdewilde commented 5 years ago

Hi @mzeidhassan , extract.ngrams() only accepts a single integer for n, so you'd have to loop over the values and concatenate the results. Something like this would do it:

ngrams = [
    ngram
    for n in range(2, 6)
    for ngram in extract.ngrams(doc, n, filter_stops=True, filter_nums=True, min_freq=2)
]
mzeidhassan commented 5 years ago

Thanks @bdewilde for your support.

I just want to say thank you for creating such a great library. Textacy is really amazing. Wish you all the best!

Thanks, Mohamed