Allow word tokenization override

Introduces a non-breaking change which allows to override custom word-level tokenization.

The new f_tokenize_words argument accepts a function which maps a text to its words.

example:

from nltk import word_tokenize
r = Readability(text, f_tokenize_words=word_tokenize)

Tests run ✔️ Tests added ✔️ Added section 'What makes a word' to Readme ✔️

Additional remarks:

The main difference between nltks TweetTokenizer and the TreebankWordTokenizer I observed is the handling of clitics and abbreviations:

Text	Tweet	Treebank
`"We've got two different solutions"`	`["We've", 'got', 'two', 'different', 'solutions']`	`['We', "'ve", 'got', 'two', 'different', 'solutions']`
`'How common are abbreviations in the U.S.?'`	`['How', 'common', 'are', 'abbreviations', 'in', 'the', 'U', '.', 'S', '.', '?']`	`['How', 'common', 'are', 'abbreviations', 'in', 'the', 'U.S.', '?']`

cdimascio / py-readability-metrics