eellak / nlpbuddy

A text analysis application for performing common NLP tasks through a web dashboard interface and an API
http://www.nlpbuddy.io/
GNU Affero General Public License v3.0
124 stars 28 forks source link
fasttext gensim natural-language-processing spacy text-analysis text-classification

NLPBuddy - Open Source Text Analysis Tool

About the project

NLPBuddy is a text analysis application for performing common NLP tasks through a web dashboard interface and an API.

It leverages Spacy for the NLP tasks plus Gensim's implementation of the TextRank algorithm for text summarization.

It supports texts in the following languages: Greek, English, German, Spanish, Portoguese, French, Italian and Dutch. Language identification is performed automatically through langid

Tasks include:

  1. Text tokenization
  2. Sentence splitting (lemmatized sentences too)
  3. Part of Speech tags identification (verbs, nouns etc)
  4. Named Entity Recognition (Location, Person, Organisation etc)
  5. Text summarization (using TextRank algorithm, implemented by Gensim)
  6. Keywords extraction
  7. Language identification
  8. For the Greek language, Categorization of text

Text can either be provided or imported after specifying a url - we use library python readability for this plus BeautifulSoup4

The Greek classifier is built with FastText and is trained in 20.000 articles labeled in these categories.

Demo

A working demo can be found on http://www.nlpbuddy.io/

Usage

Enter text and hit 'Analyze it',

alt text

API Usage

https://github.com/eellak/text-analysis/wiki/API-usage

Installation

Find development and deployment instructions here: https://github.com/eellak/text-analysis/wiki/Install

License

The code is provided under the GNU AGPL v3.0 License.