biolab / orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3
Other
127 stars 84 forks source link

Extract Keywords: New widget #642

Closed VesnaT closed 3 years ago

VesnaT commented 3 years ago

Input: Corpus, [Data Table - Words] Output: Words (Data Table)

Infers characteristic words from a Corpus. Input corpus can contain one or more documents. Reference corpus may be given as an option for methods that use differential analysis (words that are in the corpus against those from the reference). Output is provided as a list of words (Data Table, “words” column).

Take care about lemmatization (of corpus, and words). Use the same word normalization for words as was used in the corpus.

If Words on the input, Word variable is enabled, and only the words from the list are scored.

This widget is similar to the Rank widget.

VesnaT commented 3 years ago

Proposed solution:

image

PrimozGodec commented 3 years ago

The widget is merged it still needs two additional methods: TextRank and Embedding