AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
499 stars 51 forks source link

Added reference corpus keyword functionality #155

Open AndyTheFactory opened 1 year ago

AndyTheFactory commented 1 year ago

Issue by IngoKl Fri Dec 1 01:52:17 2017 Originally opened as https://github.com/codelucas/newspaper/pull/480


This pull request adds a new functionality to the NLP module that is widely used in corpus linguistics. The idea is to extracts keywords by statistically comparing a text (or corpus) to a reference corpus that is (potentially) representative of a language/genre.

I implemented the functionality in a way that the user passes the reference corpus to the nlp method. I'm not perfectly clear whether this is the ideal way.

Thank you for considering!


IngoKl included the following code: https://github.com/codelucas/newspaper/pull/480/commits

AndyTheFactory commented 1 year ago

Comment by codelucas Wed Dec 27 02:38:46 2017


Interesting stuff @IngoKl! Thanks for doing this, will take a look when I have time 💯 👍