Issue with Scandinavian characters

Hi @kmelve,

Sorry for the slow response, I was traveling and away from email. You're hitting this problem because I was using a really simple, sledgehammer approach to tokenization - just matching [a-z]+ patterns in the source text, which works well enough for English, but not for non-ascii characters.

I just pushed a fix for this on the feature/unicode branch, which now will consider any series of non-digit, non-punctuation characters to be a word - this should work with the Norwegian characters.

To try it, just install the project from source:

git clone https://github.com/davidmcclure/textplot.git
cd textplot
pyvenv env
. env/bin/activate

And then check out the branch:

git checkout -b feature/unicode origin/feature/unicode
pip install -r requirements.txt
pip setup.py develop

And give it a spin. If it works, I'll merge this into master and cut off a new release. Thanks for bringing this to my attention!

davidmcclure / textplot

Issue with Scandinavian characters #2