Open jrbowden opened 1 year ago
To me, using braille space characters in a HTML document, or any other format in which white space characters may be used to format the source code for better readability (tabs for indentations, newlines for text wrapping, ...), makes less sence than using braille space characters in a format such as BRF or PEF where all space is significant (for PEF this is the case within <row>
elements).
Perhaps it is because I use a braille font with "shadow" dots, but in my mind the blank braille pattern character, just like other braille characters, represents a braille cell, i.e. a feature of the output medium (paper or braille display).
In other words braille patterns, including the blank one, are characters that I expect to end up unchanged in the output, as opposed to all other characters, including ordinary spaces, tabs and newlines, which need interpretation in order to be rendered.
I mention this puzzle as using braille space, though highly desirable, could break #4 reflowing a text, as the U+2800 by default is not a word break point. I'm opening a new issue for no-break spaces #150 to discuss using   for the case when we really don't want reflowing.
Do we need to explicitly state that a braille space U+2800 is to be treated as a word break character in eBraille?
Do we need to use the braille space U+2800 at all? It seems like for dynamic braille, the standard space U+0020 would suffice and for embossed braille, some software to interpret the eBraille file will be needed anyway and the standard space could suffice there as well. What do we gain by using the braille space other than requiring the additional zero width space to be used throughout?
There is a serious problem using ordinary space U+0020 for those wanting to view the braille on a screen: the ordinary space U+0020 is a different width to all the braille characters.
That's a good point @jrbowden, especially if we're thinking about something like browser support or a reader not meant for braille. For a bespoke reader, it can interpret the characters how they are intended but anything normally meant for print isn't going to care. I knew they were different sizes but wasn't thinking about how we wouldn't always control how they are interpreted.
In that case we probably do need to be thinking about the braille space and zero-width. It would be good if we could use CSS but I don't know at the moment if we can.
Using the normal space makes the most sense, for it's Unicode properties. To make it appear like we want we just need to control the font (through CSS).
I think it is important to include U+2800. I have just off-topic examples, but I met several times with the problems concerning normal space among unicode braille cells, mostly when trying to display braille for sighted people. Sometimes they did not recognize cells correctly - I guess it is width problem as @jrbowden mentioned. Also I sometimes met with problems when trying to simulate braille formatting in word - spaces at the beginning of line were sometimes changed to automatic indention, lines did not match in vertical view when using normal sapce and so on.
Again, you can make any character, including a normal space, appear the way you want (with a certain width, with or without shadow dots, etc.) using a font.
So far I haven't heard any good reasons / use cases for having to treat U+2800 as ordinary white space (with its line breaking properties). Such a decision would make it difficult for implementers and would be bad for interoperability (unless authoring tools add zero-width spaces, as suggested above, but that would also be far from ideal).
Thanks @bertfrees! I knew that with the move away from ASCII braille that we were trying to get away from relying on braille fonts and so that's why I shied away from it, but you are right that using CSS would make that not a problem.
Thank you all! I wanted to raise this question while reading the published working draft version 1.0, but of course first searched through open issues.
So my question (probably to @jrbowden and @wfree-aph mostly) is: where do you find appropriate to use U+2800
? It is extremely important for me now as I'm developing several solutions related to Braille processing. Currently, both when outputting Braille and when simulating Perkins-style input from a QWERTY keyboard, I substitute the U+0020
with U+2800
which, I feel, will give me huge headaches in future. So, again, is there any appropriate scenario when this character must be used, in your opinion? Thank you!
Thanks @Menelion for this question! We've discussed these spacing characters a lot in the working group and perhaps the specification should include more information about what is expected here and what the issues are with each kind of character.
The problem with U+2800
is that it isn't treated as a space, so it leads to disjointed line breaks. We talked about sticking with it but just combining it with U+200B
but that seemed clunky.
The idea, I think, was to use U+0020
for most spaces in your file. The problem with U+0020
though is that it won't conform to rules about braille spacing and so software that supports eBraille would need to know to handle it the same as a braille character and maintain braille spacing. So as I am thinking about it now, it seems like U+2003
might be the better option, since as an em space, it should conform to the spacing requirements of braille while still providing the appropriate line breaks. We'd need to investigate this idea though and ensure it would work as intended with reading systems, especially since U+2003
isn't currently mentioned in the public working draft.
However, regardless of which space character is recommended, the thinking is that U+2800
would still need to be treated as valid because there may be use cases where it is necessary. Some examples that I can think of are very braille-specific instances such as in a braille rule book or something similar like a book that teaches braille literacy.
Does that help? It sounds like you would rather use U+2800
instead of U+0020
or U+2003
. Am I getting that correct?
Thanks @wfree-aph for your answer! It does help and doesn't at the same time 😊.
My problem is that I'm developing a Braille editor supporting several formats, including BRF, Unicode Braille and, in future, eBraille also (along with RTF and TXT). So I'm still confused about practical solutions: I have currently a Perkins emulation mode where, using your F, D, S, A, J, K, L, and Semicolon keys, you can enter Unicode Braille straight into the document. Currently when you press the Spacebar on your keyboard in this mode, a U+2800
is entered. Given what you've said, do you find it practical? What implications would it have if I don't replace it and allow the user to enter an ordinary U+0020
instead? Because, again as you've said, many complications grow on me because of the fact that U+2800
is not a space per se. Thank you so much!
@Menelion, thanks for helping me understand. My suggestion would be to use U+2800
while the user is working and only replace it with a different character during saving and only when necessary. So if the user is creating a BRF, keep U+2800
but if they are creating an eBraille file, replace it with either U+0020
or, maybe, U+2003
. You then do the opposite on open. You may already be doing that but, if not, it will at least limit when you're having to replace space characters to just during open and save.
Your program sounds really intriguing. Thank you for including eBraille in the document types you plan to support! If you'd like feedback on it or to discuss it further, please message me directly.
@wfree-aph Sorry, where can I find you personally? My public email is ap@oire.me, it's listed on my GitHub page. Thanks!
Thanks for this. Does anyone actually know why Unicode decided that U+2800 is not a word break character? Does anyone have a contact so we can ask?
It is quite simple I think. Ordinary spaces (U+0020), or any of the other allowed whitespace characters (TAB (U+0009), LF (U+000A) or CR (U+000D)), should be used to separate words, and may also occur at the very beginning or end of blocks. All these characters are equivalent (except if the CSS whitespace
property is used, but let's forget about that for a minute).
TAB, LF and CR are typically used to wrap and indent the text within the XML files.
U+2003 is another space character that we could allow (not allowed currently), but it would also be equivalent (so there's not really a point).
U+2800 is also allowed, and it is equivalent to the NO-BREAK SPACE (U+00A0). These characters should normally be used only sparsely, in case it is necessary to include pre-formatted content.
If needed, the wbr
element may be used within pre-formatted content to add break opportunities.
Your editor should probably allow entering the two kinds of spaces. Whether you want to support the no-break type of space depends on whether you want to support pre-formatted content. It also depends on how you create the BRF output: whether the formatting of the BRF is done manually by the user, or automatically by the tool.
BRF is all pre-formatted and therefore a lot different from eBraille, which is mostly reflowable.
@jrbowden Perhaps because it is expected to not collapse with adjacent U+2800 character, it was made equivalent to the no-break space, including its word breaking characteristics?
I think it makes sense.
Would it be good for us if it were different?
I believe our current recommended characters covers the needs here, and it sounds like css can solve the rest, but I'm not comfortable closing this one as the discussion seems to have trailed off. Is there still an issue to resolve here @jrbowden and @bertfrees, at least one that is reasonable we can solve?
There are (at least) two possibilities for representing ordinary spaces in a document that is in braille:
The braille space, according the Unicode spec, does not have word breaking properties. It is defined as a symbol other. The ordinary space is defined as a space separator, but may not be the correct width for a braille character.
Noting this in case it becomes important later.
Reading systems may be able to add word breaking possibilities to the braille space by adding one of the "thin" or "zero width" spaces available in Unicode.