abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

How to keep specific words together when tokenizing? #50

Open frejathim opened 3 years ago

frejathim commented 3 years ago

Hi!

My group has a bit of a challenge when it comes to keeping words like New Zealand, Donald Trump and alt-right together while tokenizing. We tried to Google and found a solution, but it seems way too comprehensive compared to how usual a text analysis problem this must be. Any smart ways to handle this?