Kylel/svm word predictor disjoint tokens

Had an implementation bug in SVMWordPredictor that was introduced when I was trying to get words to respect token boundaries. Nothing too interesting, basically mis-implemented a for-loop such that it allowed for situations like:

token 3 is not punctuation -> word 3
token 4 is punctuation -> word 4
token 5 is not punctuation -> word 3

Fix is basically ripping out this for-loop into its own function _group_adjacent_with_exceptions() and adding specific tests.

allenai / mmda

Kylel/svm word predictor disjoint tokens #262