biolab / orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3
Other
128 stars 84 forks source link

Word cloud: add number of documents containing each word to Word Count output #1040

Open wvdvegte opened 8 months ago

wvdvegte commented 8 months ago

Is your feature request related to a problem? Please describe. It would be interesting to not only know the total number of occurrences of each word in a corpus, but also in how many documents a word appears at least once. This number is already considered when using the Document Frequency filter in Preprocess text, but it would also be nice to have it in a table.

Describe the solution you'd like The most obvious place to include this is in the Word Count output of Word Cloud, I think.

Describe alternatives you've considered The numbers are somehow hidden in the output of Bag of Words, with Term Frequency set to Binary and Document Frequency & Regularization set to None, but I have no idea how to extract them from the sparse data.