NSoiffer / MathCAT

MathCAT: Math Capable Assistive Technology for generating speech, braille, and navigation.
MIT License
61 stars 35 forks source link

Add Italian translation #257

Open Tnonis90 opened 7 months ago

Tnonis90 commented 7 months ago

Hello @NSoiffer , this is Tommaso from VisionDept SRL., the italian distributor for Vispero / JAWS. We are interested in tackling the Italian for MathCAT, and are ready to start translating. Regarding speech, I kindly ask you to set up the environment with the automatic translations, so we can start out. As for Braille, we'd need a discussion on what code to choose: in italy, almost everyone nowadays uses the LAMBDA Math Code (Italian version). Do you happen to have any familiarity with that?

Thanks a lot, and ook forward to getting started with this.

Tommaso

NSoiffer commented 7 months ago

I'm very glad to help out with the Italian translations.

I'll build an initial translation in the next day or two and let you know the details.

I know a little about the LAMBDA math code, but I need a specification as to how the MathML maps to its linear format (I didn't see anything at https://www.lambdaproject.org/). I just finished implementing the German LaTeX braille code. For that, I didn't translate directly to the braille dots, but instead to the ASCII chars for LaTeX and then let the current braille mapping table do the translation. I was told that was preferable because each country that might the LaTeX use this would have different 8-dot mappings, or might use a 6-dot mapping. I suspect something similar is desirable for the LAMBDA code. Is that true?

Tnonis90 commented 7 months ago

Thanks for letting me know. At www.veia.it you can download Lambda 1.44, which does contain an XML specification with all Braille markup code. Do not use Lambda 2, as everything's embedded in the executable file in that specific version.

Thanks!

NSoiffer commented 7 months ago

I've created an "it" branch. Clone MathCAT and checkout that branch. There are instructions for translators here. Here's a short list:

  1. Go to Rules/Languages/it
  2. Open unicode.yaml and look through the translations. They are likely mostly good. If a translation is good, change the "t: ..." to "T: ...". This marks the translation as having been verified that it is good. There are some if tests for some things and those translations are more likely to be not as good. Hopefully the syntax is understandable.
  3. Open SimpleSpeak_Rules.yaml and again look through the translations (search for "t: "). Again, there are tests here such as for Verbosity and for Blindness. English often uses "the square root of ..." in a verbose setting, but in a terse one, it might be shortened to "square root x" (dropping "the" and "of"). If Italian never uses those extra words, just use an empty string. In a second pass, we can talk about making the rules more natural for Italian.
  4. Open all the files in SharedRules and do the same thing as for SimpleSpeak_Rules.yaml
  5. Open definitions.yaml. This has words for cardinal numbers (one, two, three...) and ordinal numbers (first, second, third...). Also for words used in fractions ("half", ...). These translations are likely correct, but there might be some bad ones.

At any point, you can test these out in NVDA if you have the MathCAT addon. After installing the addon, to test, copy the 'it' directory to %AppData%\nvda\addons\MathCAT\globalPlugins\MathCAT\Rules\Languages. Start NVDA or if it is running, restart NVDA (or you can go to/click on NVDA:Tools:Reload Plugins). If you have an Italian voice, it should use the Italian speech rules. Go to any page with MathML (e.g, https://it.wikipedia.org/wiki/Equazione_di_secondo_grado) and the math should be spoken in Italian. If NVDA+MathPlayer works with LAMBDA, NVDA+MathCAT should also, so that would be another source of math. If NVDA says there is an error in speaking the math, open the NVDA log (NVDA+F1). The message is a little bit hard to understand, but it will hopefully guide you to a place where you have a typo (e.g, accidentally deleted a quote mark).

A similar process applies to MathCAT in JAWS. However, I haven't used JAWS much and not with MathCAT at all. I'm not sure where the MathCAT files are stored, but wherever that is, a similar process (copy the files to Rules/Languages/it) as with NVDA should be followed. I'm not sure if it picks up the Italian voice automatically. In NVDA, that's code that I wrote.

Good luck. If you want to do a teleconference call some time, I can walk you through the process and that might clear up some questions. In a few hours, you might have something that speaks ok for some expressions and in a few days, does ok for many common expressions. To get a really natural translation, you may want to add or delete some rules and I can talk you through those or write them for you with your input.

At some point, you should do some work on unicode-full.yaml. This is where less commonly used characters can be found. It is a huge file (~3,600 lines), so you may want scan through every now and then when you are feeling a little bored and translate characters you think are really poorly translated (anything marked with 'google translation' is more likely poorly translated).

And also we can talk about whether it makes sense to implement ClearSpeak or some other speech style.

NSoiffer commented 7 months ago

@Tnonis90

Thanks for letting me know. At www.veia.it you can download Lambda 1.44, which does contain an XML specification with all Braille markup code.

I only see options for LAMBDA 2, BM2021, and EBKey. I tried BM2021 in the hopes that was old version, but after downloading, I see the "BM" stands for "Braille Music", so that's not the right thing. The EBKey description also indicates that's right either. I didn't see anything else that I could download. Can you clarify what I should get?

Tnonis90 commented 7 months ago

Hello,

I’ll start working on this today.

I won’t do Unicode and Unicode Full first, because Freedom will run those through a script which takes the translations directly from JAWS, which I’ve curated manually hence I don’t need to do the work twice.

Tommaso

Tnonis90 commented 7 months ago

Hello,

my initial commit should be present in the dedicated fork, IT branch.

I have taken care of all files, except Unicode and Unicode full, as I said, which will be handled via scripts that fetch the info from JAWS.

So far all seems to play well and the math speaks very well in Italian. I should say that my testing has been on a limited subset of Wikipedia pages, and should continue in the next days.

So far, I have found a problem with an untranslatable string, “out of” This string speaks when you up arrow out of an inner element (e.g. denominator). Could you please tell me where to fix this string so it speaks in correct Italian?

As for the Braille, I’ve retrieved the correct link for lambda 1.44, which has been removed from thehomepage.

The link is

https://veia.it/it/scarica_lambda

You should be able to get the program there.

Let’s proceed and test further!

Best regards

Cordiali Saluti

Tommaso Nonis

Vision Dept S.r.l.

Via G.B. Morgagni,6 20129 Milano

T 02-29.53.48.62

Web: http://www.visiondept.it/ VisionDept.it

Iscriviti alla http://www.visiondept.it/about.html#mc_embed_signup NewsLetter

https://www.facebook.com/VisionDeptSrl/ https://www.instagram.com/visiondept/ https://twitter.com/visiondept1

https://www.visiondept.it/cert_jaws_index.html

NSoiffer commented 6 months ago

@Tnonis90 : I don't see your commit. Did you forget to do a "git push"?

Also, I think I answered this:

So far, I have found a problem with an untranslatable string, “out of” This string speaks when you up arrow out of an inner element (e.g. denominator). Could you please tell me where to fix this string so it speaks in correct Italian?

In case I didn't, you need to translate navigate.yaml -- I left that out of my shortened instructions by accident.

Tnonis90 commented 6 months ago

I pushed it using

Github desktop to the IT branch. The sha is:

21ebc2e0d07e162b62aacbb0b64038fc4fc6fd36

And the commit name is Initial Italian Translation.

Do you still have issues locating it?

We can maybe look into seeing what was wrong there.

Tommaso

NSoiffer commented 6 months ago

When I click on your SHA, the top of the page says "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

I need to get some sleep now. If you don't beat me to it, I'll look into what's going on when I get up and see if I can correct it/bring it into the repo.

NSoiffer commented 6 months ago

I asked my git-savy son for help and the only thing we could come up with is to essentially clone things and copy files over. That's error-prone.

If you created a fork, and committed your changes there, I could do something, but I'd need to know what your fork is.

Maybe you can try and do a pull request from your repo or branch into the MathCAT repo. That would likely be the shortest path to getting this right. One place I saw says that the error sometimes comes from pushing a tag, not a branch.

Tnonis90 commented 6 months ago

Before making more mistakes, I believe it’d be better to do a conference call.

What time would suit you best? I’m six hours after eastern time

Tommaso

Tnonis90 commented 6 months ago

Ok,

it looks like I created my own repo and that’s where the branch is.

What’s the shortest path to bring the IT branch into the NSoiffer/MathCat?

Tommaso

NSoiffer commented 6 months ago

Probably the best path is to do a "Pull Request" (on top, third item after "code" and "issues") on your repo's page. It will probably suggest what to do. If not click "New pull request" (on top right) and then choose the "...compare across forks" link.

Tnonis90 commented 6 months ago

I hope I’ve done this correctly

Tommaso

NSoiffer commented 6 months ago

As you probably saw, I merged your code into the 'it' branch. I know you want to use the JAWS character translations for the characters they have. However, if you think the current files are good enough to use until you do more work, let me know and I'll merge the 'it' branch into main.

Tnonis90 commented 6 months ago

Hi,

it’ll take two weeks I think for the JAWS guys to populate the Unicode chars file. So, let’s wait for them to do that and I’ll then revise that.

Have you had the chance to have a look at the Lambda Braille code for IT?

Best

Tommaso

NSoiffer commented 6 months ago

At the end of March, I downloaded Lambda from your link and tried it out and ran into several issues. Between having to write/finish a paper for ICCHP and immersing myself in the update to Nemeth for chemistry, my memory is hazy on the problems I found :-{

I do remember that Lambda didn't work on many of the MathML examples I tried to import. I remember decompiling mathml2lambda.pyc to get a better sense of what Lambda supports, but I don't remember what I concluded (if anything). I don't think I found any documentation on the lambda code itself. Do you know where there is documentation on it?

Tnonis90 commented 5 months ago

Hi,

after further exploration, it looks like the situation is way harder to pull off than I had originally anticipated. There is no direct way to obtain a list of math symbols and their Braille representation: I think we need to actually install the program and test out each individual symbol, unless you have a better idea on how to proceed.

We may use the publicly available Six Dot code from the Italian Braille association, but the reality on the ground is that nobody uses that any longer.

NSoiffer commented 5 months ago

I could take the list of more commonly used math symbols in MathCAT (in unicode.yaml) and pull out the symbols to a file that I could then copy and paste and see what lambda generates. Then copy then back and with the aid of a program, stick that result into proper spot in the MathCAT table. I'm travelling right now and don't have lambda on my laptop, but if you paste in something like ÷, λ, ←, does lambda show useful dots? If so, that would not be too much work.

Tnonis90 commented 5 months ago

Lambda1 does not, however lambda 2 definitely does.

So, it’d be:

÷=Dots 356

Λ = dots 1238,

← = dots 12345678 (full cell),

I think this can be done

NSoiffer commented 5 months ago

I've attached a list of 360 characters, one character per line. If this isn't a good format, let me know what format you would like (e.g, all chars on a single line or 40 chars per line or ...).

I tried Lambda 2 myself, but I don't know settings I should use to get the proper braille chars. In JAWS, if I set the output table to Italian, I only see 6 dot and computer braille options. I think you need to do the conversion.

Note: there are four invisible chars in the list (U+2061 - U+2064). I don't know if lambda supports these. I suspect there might be some others that aren't supported. The list begins with a blank char (which probably translates to an empty braille cell).

List of characters: chars.txt

In order to know what braille char corresponds to what Unicode char (and hence create the list in MathCAT), don't delete any chars even if they don't translate. That way I can know that what is on line 137 corresponds to α. The alternative is for you send back something like Λ = dots 1238 for each char. What you send back can be the actual braille char (you need to let me know what the mapping is or use the Unicode braille chars) or something like 1238 and I can covert that to dots.

Hopefully this approach works.

Tnonis90 commented 4 months ago

Hello,

I hope this email finds you well.

I wonder whether we can proceed with the Lambda Italian Braille implementation for MathCat. I seem to have put the discussion somewhere I cannot find, so was asking what the current state of affairs was regarding this.

Best regards

Tommaso

Cordiali Saluti

Tommaso Nonis

Vision Dept S.r.l.

Via G.B. Morgagni,6 20129 Milano

T 02-29.53.48.62

Web: http://www.visiondept.it/ VisionDept.it

Iscriviti alla http://www.visiondept.it/about.html#mc_embed_signup NewsLetter

https://www.facebook.com/VisionDeptSrl/ https://www.instagram.com/visiondept/ https://twitter.com/visiondept1

https://www.visiondept.it/cert_jaws_index.html

Tnonis90 commented 4 months ago

Hi Neil,

I've recovered our discussion and was able to get a copy of Lambda 2 set up for use.

However, it does a nasty thing that is unpreventable it seems:

If I copy paste one or more chars from your file, Lambda interprets whatever I copy as being text, not math. So, it lets JAWS itself handle the Braille, rather than passing control to its internal Braille engine.

I have no idea how to solve this.

Tommaso

Cordiali Saluti

Tommaso Nonis

Vision Dept S.r.l.

Via G.B. Morgagni,6 20129 Milano

T 02-29.53.48.62

Web: http://www.visiondept.it/ VisionDept.it

Iscriviti alla http://www.visiondept.it/about.html#mc_embed_signup NewsLetter

https://www.facebook.com/VisionDeptSrl/ https://www.instagram.com/visiondept/ https://twitter.com/visiondept1

https://www.visiondept.it/cert_jaws_index.html