This pull request adds a new functionality to the NLP module that is widely used in corpus linguistics. The idea is to extracts keywords by statistically comparing a text (or corpus) to a reference corpus that is (potentially) representative of a language/genre.
I implemented the functionality in a way that the user passes the reference corpus to the nlp method. I'm not perfectly clear whether this is the ideal way.
Issue by IngoKl Fri Dec 1 01:52:17 2017 Originally opened as https://github.com/codelucas/newspaper/pull/480
This pull request adds a new functionality to the NLP module that is widely used in corpus linguistics. The idea is to extracts keywords by statistically comparing a text (or corpus) to a reference corpus that is (potentially) representative of a language/genre.
I implemented the functionality in a way that the user passes the reference corpus to the
nlp
method. I'm not perfectly clear whether this is the ideal way.Thank you for considering!
IngoKl included the following code: https://github.com/codelucas/newspaper/pull/480/commits