Speech-Rule-Engine / speech-rule-engine

Generating speech descriptions for XML structures
https://zorkow.github.io/speech-rule-engine/
Apache License 2.0
75 stars 39 forks source link

Question: Fractions in nb and nn and their use of ordinals #688

Open tarjeiba opened 1 year ago

tarjeiba commented 1 year ago

I am looking into how SRE treats fractions in Norwegian bokmål (nb) and nynorsk (nn).

Both numbers_nn.ts and numbers_nb.ts define numberToOrdinal(num: number, plural: boolean): string. What I don't really understand is the use of this function.

It looks to me like it's used through vulgarFraction(node: Element): Span[] in numbers_util.ts, where we say numberToOrdinal(denom, enum !== 1).

My issue is the following. If I now transform a fraction \frac{1}{3} in nb, I'll get "en tredje". The problem is this, while this is a literal translation of the english "one third", in bokmål it is "one third" in the sense of "I've got one third place, two forth places and one fifth place over my career as a racing driver". To get the fraction, "one third", I'd say "en tredjedel" (one part of three).

What strikes me, though, is that the tests in SRE-tests says that the current behavior is the wanted behavior. Is the current implementation what is considered correct?

Further, I'd like to refactor numberOrdinal to get my wanted behavior, but I am skeptical of doing so as I only want to affect its behavior when used in fractions. What I'd really like to do, is modify vulgarFraction to use a separate function for denom so that numberToOrdinal could do just that, take a number and return an ordinal, without the need to know whether the number is to be used in a fraction or is "plural".

Am I making any sense, or have I just not understood the implementation correctly? (The latter is highly likely.)

Thanks for all your work!

zorkow commented 1 year ago

Thanks for reporting this.

Firstly, please note that the current tests are effectively the results SRE produced after the initial localisation. (Think of them as galley proofs in typesetting.) The normal procedure is that the native speaker working on the localisation read over these proofs, gives feedback and I fix things in an iterative process. Unfortunately, I never received any feedback regarding the two Norwegian languages from the translator on the initial output at the time. This was also during Covid, which meant that we could not just spent a couple of days in the same room together, which usually helps insure correctness. So there might still be plenty of skeletons in the closet.

If you would find the time and have a look over the tests and send me anything you thinks is wrong, I'll be happy to make corrections.

tarjeiba commented 1 year ago

I see. Thanks for your reply.

I might need some help getting the tests to compile locally. Right now I've just done a brutforce search-and-replace for e.g. "to femte" -> "to femtedeler", which seems to work ... but it feels like the wrong approach.

Or would you prefer that I submit a PR to the SRE repo directly, and you can use that to generate new tests for the affected parts?

zorkow commented 1 year ago

Firstly, reading the test code is probably too difficult. However, all the output that we usually use for proof reading is at: https://speech-rule-engine.github.io/sre-tests/output/ That gives you the different test categories, rendered math expressions and English speech in comparison. I have not updated it in a while but Bokmal should still be the up to date.

If you want to fix up tests directly, a PR against the test repo https://github.com/Speech-Rule-Engine/sre-tests/ is of course always welcome. Again doing this by hand might be a very tedious job. But since speech generation is rule based (or procedural for numbers) its usually easier to spot the pattern that goes wrong and I can try to fix the rules and push new output.

If you want to actually fix code or rules up yourself, you are again very welcome. But as you said you might need a bit of help to get started. I'll be happy to answer questions here or by email. Alternatively, we can have an online chat. I'll be travelling in the US the next two weeks (starting tomorrow morning). But maybe we can schedule a call somewhere around 21-23 Nov, i.e., before Thanksgiving?

tarjeiba commented 1 year ago

Thanks for your reply. I sent an email to the address listed on your Github profile.