daisy / ebraille

Repository for developing use cases and standard for digital braille
17 stars 5 forks source link

Possibility for optional hyphens and no-break spaces #51

Open jrbowden opened 1 year ago

jrbowden commented 1 year ago

If a braille file, prepared for a particular line length, is opened on a reading system with a different line length, to optimise the braille, it would be useful that long words can be hyphenated appropriately, or that text that should not be split, remains together.

Examples: The word "responsibilities" (to give but one) is a long word with no contractions in English braille. Due to word wrapping, without hyphenation, it can cause a bad line break.

Though not always considered essential, it should be possible to insert hyphenation points into such long words to allow good spllitting if needed.

Along similar lines, email and web addresses can be extremely long "words" and introducing hyphenation points can prevent splitting at inappropriate points. "Inappropriate" would be just to cram as much onto the available braille line and wrap regardless (worst case scenario).

On the other hand, in other situations it is considered good practice to prevent a line break between certain words - for example "chapter 3", "25 m" or "the number 2 000 000". Using no-break spaces would prevent inappropriate line breaking in these cases.

Of course, unless "advanced" searches are used, an ordinary text search should skip over optional hyphens and should treat no-break spaces as ordinary spaces.

Proposal: Concepts such as an optional hyphenation point, no-break space and other typographical helpers should be available in the braille file. Optional hyphens should also include what braille sign(s) (if any) should be shown if the hyphen were inserted (it varies according to braille code).

JakeKyle commented 1 year ago

How could cases be dealt with where the split hyphenated word is transcribed differently from the unhyphenated. I think @bertfrees is referring to this in #4.

There are several examples in UEB 2013 rule book section 10.13 "Word Division". One such is "inconvenient" in grade 2 UEB:

unhyphenated: ⠔⠉⠕⠝⠧⠢⠊⠢⠞

hyphenated: ⠊⠝⠤ ⠉⠕⠝⠧⠢⠊⠢⠞

How could this word with be represented in ebraille so that if it needed to be hyphenated it would obey the rules? I suppose all the alternative renderings would need to be part of the file?

Apologies if this has been dealt with elsewhere but I can't find it.

bertfrees commented 1 year ago

Another example I recently learned is "comprendre" in French. It is normally contracted to "⠤⠖⠢⠹⠑", but when the line is broken after "com", it becomes:

................  ⠉⠺⠤
⠖⠢⠹⠑

There are also many cases where a word is spelled differently in both print and braille when broken across lines. An example is "autootje" ("⠁⠥⠞⠕⠕⠞⠚⠑") in Dutch which is broken into "auto-tje":

..............  ⠁⠥⠞⠕⠤
⠞⠚⠑

One way to deal with this is by simply disallowing breaking "inconvenient" after "in", "comprendre" after "com" and "autootje" after "auto", by not including soft hyphen characters at those positions.

The more optimal but also more complicated way to deal with it is to let the reading device perform hyphenation (in addition to hyphenation points present in the document). The main difficulty with this approach is word detection. A hyphenation algorithm typically takes words as input. When hyphenation is performed on print text, word detection is pretty straightforward thanks to Unicode character properties. When hyphenating braille it's a different story because there is no simple way to distinguish letters from punctuation.

jrbowden commented 8 months ago

Splitting off issue #150 to discuss use of no break space characters. This issue 51 is now for optional word breaking with soft hyphens.

jrbowden commented 8 months ago

Another complication of word breaking in braille is that the "hyphen" character varies according to the braille code being used and the context. We would probably need some kind of CSS extension to assign the correct braille optional hyphen character.

A couple of examples from UEB:

  1. In ordinary text, such as the word "responsibilities": if this word were broken at a line break, the hyphen character should be dots 3-6 (hyphen). e.g. ........ ⠗⠑⠎⠏⠕⠝⠎⠊⠤ ⠃⠊⠇⠊⠞⠊⠑⠎

  2. For a long URL or mathematical expression, the hyphenation character is dot 5 (called a line cintunation indicator). e.g.: e.g. https://github.com/daisy/ebraille

⠓⠞⠞⠏⠎3//⠛⠊⠞⠓⠥⠃4⠉⠕⠍_⠌⠐ ⠙⠁⠊⠎⠽⠸⠌⠑⠃⠗⠁⠊⠇⠇⠑

  1. In UEB, there is also a continuation indicator at a space (dot 5-5) used to show (particularly for technical material) that the print does not have a line break at that position: e.g. (computer code): printf("Hi %s and welcome!\n, user.name);

⠏⠗⠊⠝⠞⠋⠐⠣⠠⠶⠠⠓⠊⠀⠨⠴⠎⠀⠁⠝⠙⠀⠺⠑⠇⠉⠕⠍⠑⠖⠸⠡⠝⠠⠶⠂⠐⠐ ⠀⠀⠀⠀⠥⠎⠑⠗⠲⠝⠁⠍⠑⠐⠜⠆

bertfrees commented 8 months ago

No CSS extension should be needed for that. There already exists a hyphenate-character property.

bertfrees commented 4 months ago

Missing examples in the tagging (and styling) best practices document:

Missing examples in the styling best practices document: