Open wvdvegte opened 1 year ago
Two small corrections:
Thank you for the report. I think we should internally discuss the best solution to this issue. Is there any other situation where you would like to have pos tags and then have them removed later besides the following two:
Yes, I assume the POS tags (if present) make a difference not only in filtering but in any type of analysis (classification, clustering, network analysis, ...), but I'd like to have the choice not to show them in any type of visualization - not only Word Cloud but also, for instance, Annotated Corpus Map and even in Data Table. There, I think it also makes sense to merge different 'versions' of a word, like 'practitioner' in my screenshot above.
BTW, Annotated Corpus Map is clustering and visualization in one widget. I seems to makes sense to consider the POS tags for clustering but not for the visualization.
This is a bit of a stale issue but I gave it some thought. Word Cloud currently doesn't show POS tags anymore. However, it would not merge two words with the same name into one. I propose adding an option to remove POS tags in Preprocess Text. It makes the most sense to me. That said - where in Preprocess Text? As a final option in POS Tagger? As in "POS tag or remove any tags"?
I agree this could best be added to Preprocess Text. However, if you add it to POS Tagger, you have to activate POS Tagger twice: once before and once after Filtering. Perhaps it makes more sense as a final option in Filtering, where the current final option is filtering based on POS tags?
Duh, how did this not occur to me? 🤦♀️ Filtering it is.
Is your feature request related to a problem? Please describe. In a workflow where I applied POS tagging to allow selecting (for instance) just nouns and verbs, then Bag of Words, Distances, Hierarchical Clustering and visualize clusters in Word Cloud, the word cloud shows all words with their POS tags, and words that are present with different tags are shown multiple times:
Instead I would like to be able to see each word in Word Cloud only once, without POS tagging. Contrary to Bag of Words, widgets with similar functionality such as Document Embedding or Similarity Hashing do not produce output with POS tagging.
Describe the solution you'd like I think there are different options:
Describe alternatives you've considered Couldn't find any