fabianvf / python-rake

MIT License
130 stars 35 forks source link

Rake.split_sentences(text) uses 'u' as separator #30

Closed xyutech closed 7 years ago

xyutech commented 7 years ago

Hello, I met an issue that split_sentences(text) function uses 'u' as separator. For instance text: "is an incredibly popular library and for good reason it s powerful fast" sentences list: [u'is an incredibly pop', u'lar library and for good reason it s powerf', u'l fast'] Definitely I can fix it at my environment, but I wonder what I did wrong and why nobody met this issue before? My environment is python 2.7, python-rake is installed with pip.

jkterry1 commented 7 years ago

That just means the strings are being represented as unicode strings. '' is an ascii string in python 2.7 and u'' is a unicode string. They work the same as normal strings, details here: https://docs.python.org/2/howto/unicode.html

That idiosyncrasy is one of the thing's cleaned up in python 3.x by the way, and one major reason it's recommended to use instead of python 2.7. I used unicode strings specifically because they're more robust and notably support more languages, and this is a multilingual library. Tell me if these are actually causing problems for you, but they shouldn't. Closed.

xyutech commented 7 years ago

Thank you for you reply. Just let me add some more info to make sure that we are on the same page. I did not tell about notation u'is an incredibly pop' It is clear. My issue was about input string was separated by 'u'. So input is: is an incredibly popular library and for good reason it s powerful fast and separation is is an incredibly pop | lar library and for good reason it s powerf | l fast

klockeph commented 7 years ago

Got the same Problem - 'restaurant' is being split into 'resta' and 'rant'...