NSoiffer / MathCAT

MathCAT: Math Capable Assistive Technology for generating speech, braille, and navigation.
MIT License
63 stars 35 forks source link

Add Finnish 6-dot braille characters (unicode.yaml) #292

Closed samimaattaCelia closed 1 month ago

NSoiffer commented 1 month ago

I'm confused about the last commit of yours. It has several lines like the following that contain "p234;", etc. Some earlier commits seemed to get rid of them.

 - "¢": [t: "p234;p1345;p2345;"]                # 0x00A2 (cent)

It also contains a file "Rules/Braille/Finnish/dummy_file_for_braille.txt". What's that for?

samimaattaCelia commented 1 month ago

Thank you for noticing this! I was using this mark up for braille, since I don't have an easy way of writing Unicode braille characters. Seems like that was accidentally left in. If you have a solution for this, I'm interested!

By the way, should I also translate the parentheses? The comments say that they are handled later. Where and should they be translated there? Line 401:

 - "(": [t: "("]                # 0x0028 (left parenthesis -- handled later)
 - ")": [t: ")"]                # 0x0029 (right parenthesis -- handled later)

What about the characters that have different meaning if they are in general text or in math? Such an example is the factorial. Line 396:

 - "!": [T: "⠲"]                # 0x0021 (exclamation point) 
 - "!": [T: "⠠⠲"]                # FI: braille for the factorial, probably requires a test

Also should the Greek letters for sum, product and so on have different characters, because they are not text, but identifiers? This is what I have written now, because they the same as their Greek counterparts. Line 389:

# FI: These Greek letters could be removed, because they are the same as above?
 - "µ": [T: "GL⠍"]              # 0x00B5 (Micro (Greek mu))
 - "Ω": [T: "CGL⠚"]             # 0x2126 (Ohm sign (capital Greek omega))
 - "∆": [T: "CGL⠙"]             # 0x2206 (Increment (capital Greek delta))
 - "∏": [T: "CGL⠏"]             # 0x220F (Product (capital Greek pi))
 - "∑": [T: "CGL⠎"]             # 0x2211 (Sum (capital Greek sigma))
NSoiffer commented 1 month ago

Avoiding pXXX

I have some python code that converts dots to unicode braille that I use occasionally. That file does other conversions such as to and from ASCII braille, but that's probably not of interest to you. If you want to try it, change to the PythonScripts directory in MathCAT, in a shell window, type python -i ascii_braille.py (you need python 3.x, where 'x' is probably 7 or later -- they are up to 12 I think). Then you can type either:

The unicode char(s) corresponding to the input are the result.

I could add a d2u_loop function that loops until you enter an empty line if you find that you need to repeatedly do this. That would avoid having to do a function call every time. I did that for the ASCII braille -> Unicode because some of the US specs used ASCII braille and I was copying from their examples into my tests.

In case you aren't familiar with using python interactively, you type quit() to exit python.

Other questions:

Parens

It appears that this is because of drop numbers. In the code I see:

    "(" => "⠦",     // Not really needed, but done for consistancy with ")"
    ")" => "⠴",     // Needed for rules with drop numbers to avoid mistaking for dropped 0

There is a cleanup that detects a dropped number followed by ). If parens are converted to braille first, then if any other char contains ⠴, it might accidentally get recognized. Maybe Finnish braille doesn't have this possibility, but not converting means I don't have to think too hard about it :-)

Text vs Math chars

Because MathCAT only works on math, that's not normally an issue. However, you can have an mtext element, and that maybe should use a different char. In that case, write something like:

 - "!":                        # 0x0021 (exclamation point) 
    - test:
        if: "self::m:mtext"
        then: [t: "⠲"]
        else: [t: "⠠⠲"]

Greek Letters

They can't be removed because they are separate Unicode characters and MathCAT looks at the input to decide what to output.

If those characters follow the same braille rules as the Greek letter look alikes, then they should be output the same. There are several other braille codes that treat them the same and that is where those lines originate from. If they are treated differently, then don't use "CGL" (etc) and instead directly use the characters. Having written that, I suspect that they shouldn't use "CGL". Where that would potentially be wrong is if what follows is a string of capital Greek letters. Then (at least in some codes), they might get a "capital word indicator" output for the string of capitals. Or maybe a language word indicator, although I don't remember any of the braille codes having that, but it's been a couple of years since I wrote the Nemeth and UEB translators.

samimaattaCelia commented 1 month ago

Thank you! I added changes to the exclamation point and we also have dropped numbers in Finnish math braille, so the parens rule should apply here as well. The Greek letter output is the same regardless if it's text or math, so I'm keeping that the same as well.

I will see if I could use the python script.

Now the pull request should be ok.