Fix Issue #30 - Split words on U

fabianvf / python-rake

MIT License

130 stars 35 forks source link

Fix Issue #30 - Split words on U #31

Closed klockeph closed 6 years ago

klockeph commented 7 years ago

The regex-string is not in Unicode, thus the \u... control sequence does have unexpected behaviour. Just try split_sentences("restaurant"), it will return ["resta", "rant"], which is obviously bad.

Adding a simple u to the Regex, will force python to interpret it in unicode and fix this issue.

Tested with python2.7

jkterry1 commented 7 years ago

Can you validate it on python3.x as well?

klockeph commented 7 years ago

A quick test showed that python3 currently has no problems. But adding the u does not break anything, at least for the (rather small) tests that I just did.

jkterry1 commented 7 years ago

@fabianvf please merge PR #29 and then merge this on top of it

jkterry1 commented 7 years ago

@fabianvf yo?