adah1972 / libunibreak

The libunibreak library
zlib License
173 stars 38 forks source link

Single line of text in multiple languages possible? #25

Closed capr closed 4 years ago

capr commented 5 years ago

Hi!

Consider a single line of text with words in multiple langues. This requires running the line-breaking algorithm on each span of text separately since the line-breaking function takes a language parameter.

But then the algorithm puts a hard break at the end of each piece of text.

Is this a limitation of UAX#14 or libunibreak? And is there a workaround?

Thanks!

adah1972 commented 5 years ago

The language information is used to tailor some specific behaviour regarding line breaking, especially concerning punctuation marks like quotes. In mixed-language text, set the language to the ‘main’ language. I do not suppose you want to use different styles of quotation marks in your text, do you?

Also remember that not setting the language at all is generally safe, though it can result in fewer line breaking opportunities.

capr commented 5 years ago

@adah1972 thanks for the info. I don't have the notion of a base language in my API, any arbitrary span of text can have a different language attribute, but that puts a hard-break between spans. Looks like I need to add the notion of a paragraph language.