cgiffard / Downsize

Tag safe text truncation for HTML and XML!
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

fix for downsize for russian #1

Closed wwwsevolod closed 10 years ago

wwwsevolod commented 11 years ago

downsize have a problem, that you can't downsize russian in right way (may be you need to change way from regexp to word searching, and break by words, not by characters, like

string.split(/\s/).length > 200 && string.split(/\s/).slice(0, 200).join(' ').length > 1000

or something like that, to make it work with any language.

cgiffard commented 11 years ago

Thanks! I think using a better, multilingual regex is definitely a good idea. I'm going to merge this locally and also include unicode ranges for asian languages, and accented character variants if I can.

wwwsevolod commented 11 years ago

i think ranges will not work good at all, may be there alternative regexp engine for node? bcz \w is good to match words, but in javascript it matches only words in latin characters.

matthojo commented 11 years ago

This problem is also present in Chinese and all non a-z languages.

wwwsevolod commented 11 years ago

@matthojo as I said, it needs to be done not in regexp or using alternative regex engine

cgiffard commented 11 years ago

I found XRegexp — apparently it can uncode ranges. So I'll find all the unicode whitespace and create a range(s) for that.

cgiffard commented 10 years ago

I got swamped with work, but @halfdan did my job for me! This issue should be fixed in 0.0.4. :)

Sincere apologies for my slow response.