daisy / ebraille

Repository for developing use cases and standard for digital braille
16 stars 5 forks source link

Possible issue with ordinary spaces #79

Open jrbowden opened 1 year ago

jrbowden commented 1 year ago

There are (at least) two possibilities for representing ordinary spaces in a document that is in braille:

  1. The ordinary space character, U+0020
  2. The braille space character U+2800

The braille space, according the Unicode spec, does not have word breaking properties. It is defined as a symbol other. The ordinary space is defined as a space separator, but may not be the correct width for a braille character.

Noting this in case it becomes important later.

Reading systems may be able to add word breaking possibilities to the braille space by adding one of the "thin" or "zero width" spaces available in Unicode.

bertfrees commented 1 year ago

To me, using braille space characters in a HTML document, or any other format in which white space characters may be used to format the source code for better readability (tabs for indentations, newlines for text wrapping, ...), makes less sence than using braille space characters in a format such as BRF or PEF where all space is significant (for PEF this is the case within <row> elements).

Perhaps it is because I use a braille font with "shadow" dots, but in my mind the blank braille pattern character, just like other braille characters, represents a braille cell, i.e. a feature of the output medium (paper or braille display).

In other words braille patterns, including the blank one, are characters that I expect to end up unchanged in the output, as opposed to all other characters, including ordinary spaces, tabs and newlines, which need interpretation in order to be rendered.

jrbowden commented 6 months ago

I mention this puzzle as using braille space, though highly desirable, could break #4 reflowing a text, as the U+2800 by default is not a word break point. I'm opening a new issue for no-break spaces #150 to discuss using &nbsp for the case when we really don't want reflowing.

Do we need to explicitly state that a braille space U+2800 is to be treated as a word break character in eBraille?

wfree-aph commented 6 months ago

Do we need to use the braille space U+2800 at all? It seems like for dynamic braille, the standard space U+0020 would suffice and for embossed braille, some software to interpret the eBraille file will be needed anyway and the standard space could suffice there as well. What do we gain by using the braille space other than requiring the additional zero width space to be used throughout?

jrbowden commented 6 months ago

There is a serious problem using ordinary space U+0020 for those wanting to view the braille on a screen: the ordinary space U+0020 is a different width to all the braille characters.

wfree-aph commented 6 months ago

That's a good point @jrbowden, especially if we're thinking about something like browser support or a reader not meant for braille. For a bespoke reader, it can interpret the characters how they are intended but anything normally meant for print isn't going to care. I knew they were different sizes but wasn't thinking about how we wouldn't always control how they are interpreted.

In that case we probably do need to be thinking about the braille space and zero-width. It would be good if we could use CSS but I don't know at the moment if we can.

bertfrees commented 6 months ago

Using the normal space makes the most sense, for it's Unicode properties. To make it appear like we want we just need to control the font (through CSS).

skntkacm commented 6 months ago

I think it is important to include U+2800. I have just off-topic examples, but I met several times with the problems concerning normal space among unicode braille cells, mostly when trying to display braille for sighted people. Sometimes they did not recognize cells correctly - I guess it is width problem as @jrbowden mentioned. Also I sometimes met with problems when trying to simulate braille formatting in word - spaces at the beginning of line were sometimes changed to automatic indention, lines did not match in vertical view when using normal sapce and so on.

bertfrees commented 6 months ago

Again, you can make any character, including a normal space, appear the way you want (with a certain width, with or without shadow dots, etc.) using a font.

So far I haven't heard any good reasons / use cases for having to treat U+2800 as ordinary white space (with its line breaking properties). Such a decision would make it difficult for implementers and would be bad for interoperability (unless authoring tools add zero-width spaces, as suggested above, but that would also be far from ideal).

wfree-aph commented 6 months ago

Thanks @bertfrees! I knew that with the move away from ASCII braille that we were trying to get away from relying on braille fonts and so that's why I shied away from it, but you are right that using CSS would make that not a problem.

Menelion commented 1 month ago

Thank you all! I wanted to raise this question while reading the published working draft version 1.0, but of course first searched through open issues.
So my question (probably to @jrbowden and @wfree-aph mostly) is: where do you find appropriate to use U+2800? It is extremely important for me now as I'm developing several solutions related to Braille processing. Currently, both when outputting Braille and when simulating Perkins-style input from a QWERTY keyboard, I substitute the U+0020 with U+2800 which, I feel, will give me huge headaches in future. So, again, is there any appropriate scenario when this character must be used, in your opinion? Thank you!

wfree-aph commented 1 month ago

Thanks @Menelion for this question! We've discussed these spacing characters a lot in the working group and perhaps the specification should include more information about what is expected here and what the issues are with each kind of character.

The problem with U+2800 is that it isn't treated as a space, so it leads to disjointed line breaks. We talked about sticking with it but just combining it with U+200B but that seemed clunky.

The idea, I think, was to use U+0020 for most spaces in your file. The problem with U+0020 though is that it won't conform to rules about braille spacing and so software that supports eBraille would need to know to handle it the same as a braille character and maintain braille spacing. So as I am thinking about it now, it seems like U+2003 might be the better option, since as an em space, it should conform to the spacing requirements of braille while still providing the appropriate line breaks. We'd need to investigate this idea though and ensure it would work as intended with reading systems, especially since U+2003 isn't currently mentioned in the public working draft.

However, regardless of which space character is recommended, the thinking is that U+2800 would still need to be treated as valid because there may be use cases where it is necessary. Some examples that I can think of are very braille-specific instances such as in a braille rule book or something similar like a book that teaches braille literacy.

Does that help? It sounds like you would rather use U+2800 instead of U+0020 or U+2003. Am I getting that correct?

Menelion commented 1 month ago

Thanks @wfree-aph for your answer! It does help and doesn't at the same time 😊.
My problem is that I'm developing a Braille editor supporting several formats, including BRF, Unicode Braille and, in future, eBraille also (along with RTF and TXT). So I'm still confused about practical solutions: I have currently a Perkins emulation mode where, using your F, D, S, A, J, K, L, and Semicolon keys, you can enter Unicode Braille straight into the document. Currently when you press the Spacebar on your keyboard in this mode, a U+2800 is entered. Given what you've said, do you find it practical? What implications would it have if I don't replace it and allow the user to enter an ordinary U+0020 instead? Because, again as you've said, many complications grow on me because of the fact that U+2800 is not a space per se. Thank you so much!

wfree-aph commented 1 month ago

@Menelion, thanks for helping me understand. My suggestion would be to use U+2800 while the user is working and only replace it with a different character during saving and only when necessary. So if the user is creating a BRF, keep U+2800 but if they are creating an eBraille file, replace it with either U+0020 or, maybe, U+2003. You then do the opposite on open. You may already be doing that but, if not, it will at least limit when you're having to replace space characters to just during open and save.

Your program sounds really intriguing. Thank you for including eBraille in the document types you plan to support! If you'd like feedback on it or to discuss it further, please message me directly.

Menelion commented 1 month ago

@wfree-aph Sorry, where can I find you personally? My public email is ap@oire.me, it's listed on my GitHub page. Thanks!

jrbowden commented 1 month ago

Thanks for this. Does anyone actually know why Unicode decided that U+2800 is not a word break character? Does anyone have a contact so we can ask?

bertfrees commented 1 month ago

It is quite simple I think. Ordinary spaces (U+0020), or any of the other allowed whitespace characters (TAB (U+0009), LF (U+000A) or CR (U+000D)), should be used to separate words, and may also occur at the very beginning or end of blocks. All these characters are equivalent (except if the CSS whitespace property is used, but let's forget about that for a minute).

TAB, LF and CR are typically used to wrap and indent the text within the XML files.

U+2003 is another space character that we could allow (not allowed currently), but it would also be equivalent (so there's not really a point).

U+2800 is also allowed, and it is equivalent to the NO-BREAK SPACE (U+00A0). These characters should normally be used only sparsely, in case it is necessary to include pre-formatted content.

If needed, the wbr element may be used within pre-formatted content to add break opportunities.

Your editor should probably allow entering the two kinds of spaces. Whether you want to support the no-break type of space depends on whether you want to support pre-formatted content. It also depends on how you create the BRF output: whether the formatting of the BRF is done manually by the user, or automatically by the tool.

BRF is all pre-formatted and therefore a lot different from eBraille, which is mostly reflowable.

bertfrees commented 1 month ago

@jrbowden Perhaps because it is expected to not collapse with adjacent U+2800 character, it was made equivalent to the no-break space, including its word breaking characteristics?

I think it makes sense.

Would it be good for us if it were different?