Closed wwwsevolod closed 10 years ago
Thanks! I think using a better, multilingual regex is definitely a good idea. I'm going to merge this locally and also include unicode ranges for asian languages, and accented character variants if I can.
i think ranges will not work good at all,
may be there alternative regexp engine for node? bcz \w
is good to match words, but in javascript it matches only words in latin characters.
This problem is also present in Chinese and all non a-z languages.
@matthojo as I said, it needs to be done not in regexp or using alternative regex engine
I found XRegexp — apparently it can uncode ranges. So I'll find all the unicode whitespace and create a range(s) for that.
I got swamped with work, but @halfdan did my job for me! This issue should be fixed in 0.0.4. :)
Sincere apologies for my slow response.
downsize have a problem, that you can't downsize russian in right way (may be you need to change way from regexp to word searching, and break by words, not by characters, like
or something like that, to make it work with any language.