datadesk / django-softhyphen

A Python library for hyphenating HTML in your Django project
http://django-softhyphen.rtfd.org
38 stars 10 forks source link

Prevent hyphenation of short words in Russian #17

Open Walkeryr opened 10 years ago

Walkeryr commented 10 years ago

I've seen this example in the source code:

Short words are not hyphenated

>>> hyphenate("<p>The brave men, living and dead.</p>")
u'<p>The brave men, liv&shy;ing and dead.</p>'

This doens't hold for Russian language where 5 letter words got hyphenated, how can I control this behavior?

palewire commented 10 years ago

Interesting question.

I'm not sure I have the answer, being ignorant of Russian hyphenation rules. This library uses a Russian dictionary by Peter Novodvorsky in the dicts directory. You can read more about it here and the dictionary itself is here.

It might be possible to add some option to the hyphenator that ignores word tokens below a certain size, but my recollection is there's nothing in the code that approaches that right now.

palewire commented 10 years ago

Perhaps adding a character limit greater than zero here might do it?