Had an implementation bug in SVMWordPredictor that was introduced when I was trying to get words to respect token boundaries. Nothing too interesting, basically mis-implemented a for-loop such that it allowed for situations like:
token 3 is not punctuation -> word 3
token 4 is punctuation -> word 4
token 5 is not punctuation -> word 3
Fix is basically ripping out this for-loop into its own function _group_adjacent_with_exceptions() and adding specific tests.
Had an implementation bug in SVMWordPredictor that was introduced when I was trying to get words to respect token boundaries. Nothing too interesting, basically mis-implemented a for-loop such that it allowed for situations like:
Fix is basically ripping out this for-loop into its own function
_group_adjacent_with_exceptions()
and adding specific tests.