Spanish Support - Githubissues

Laifsyn commented 7 months ago

Hello. I might try add spanish support. I haven't properly fully read how other lang's implementations are, but is there anything I should know? Or should wait till I submit pull request first?

Also, since it is spanish impl I was thinking about using spanish names for statics, and possibly even variables as well. as for methods I guess can leave them in english.

Also I might not be able to implement the Currency and date and others (Only cardinal if I didn't misinterpret it) since they feel a bit out of my league.... at least not at first!

[ ] Spanish Identifiers (type's impl method, constants and variables)

Laifsyn commented 7 months ago

Update: I think I managed to properly convert any integer up to billion(for now)

Ballasi commented 7 months ago

Hey, sorry for the late reply, for some reason the new issue didn't send anything in my GitHub notifications center, not sure why. Thanks for being interested in contributing!

I guess there isn't much to know as long as you follow the way the program works (so in the end, support ordinal/cardinal/ordinal_num/year, currencies, ... / use preferences methods in case Spanish have anything really specific to the language). Also, I feel like it is very important to add a bunch of unit tests to make sure everything is fine. Don't hesitate to write them as early as possible, possibly before even implementing the language itself. I have found multiple quirks by doing so.

If you don't feel like having ordinal/ordinal_num/year/currencies supported for now, that's fine! It's important to go through it step by step. I could probably take a look at that on my side! It's already very generous to offer to implement a language!

Once you feel like your changes are ready, make a PR, and I'll happily discuss the changes with you :) I am used to doing a few back and forth so don't expect any big changes like this to be instantly merged, just want to make sure everything is perfect!

If you do want to know what needs to be added for a language, I recommend you take a look at the Ukrainian PR #19 and the quick change I added afterwards 93ef526.

Please don't hesitate to reach out to me if you have any questions!

Laifsyn commented 7 months ago

I just noticed, but is it okay to add files of my own while I write the fork up? Because I can easily delete these unrelated files berfore submitting the pull request, but they will appear in the git history because the pull request seem to not squash all commits. Or how should I do for this case?

Laifsyn commented 7 months ago

For whoever stumbles upon this here and knows better.. which is more correct to say? for 1_001_700 "un millón mil setecientos" or "un millón un mil setecientos"

Also: for 1_700 "mil setecientos" or "un mil setecientos"

Ballasi commented 7 months ago

I just noticed, but is it okay to add files of my own while I write the fork up? Because I can easily delete these unrelated files berfore submitting the pull request, but they will appear in the git history because the pull request seem to not squash all commits. Or how should I do for this case?

I usually apply PR with a squash method, so it should remove it altogether. Otherwise, I'll do it by hand. If you can do the squash by yourself before publishing the patch, it would be best!

Also: for 1_700 "mil setecientos" or "un mil setecientos"

According to http://www.intro2spanish.com/vocabulary/numbers/advanced.htm you need "un" for numbers between 1000 and 1099 :)

Laifsyn commented 7 months ago

I usually apply PR with a squash method, so it should remove it altogether. Otherwise, I'll do it by hand. If you can do the squash by yourself before publishing the patch, it would be best!

I guess I'll figure it out when it comes the time then

According to http://www.intro2spanish.com/vocabulary/numbers/advanced.htm you need "un" for numbers between 1000 and 1099 :)

I think, but it still sounds more natural without "un" for me and my friends

Also for cases like this 801_100_001, is it "ochocientos uno millones..." or "ochocientos un millón"? when it is the first triplet, does it grab the singular form of the Milliards, and additionally takes the expression of singular "un" (1), or it should take the plural form of milliards regardless the value or position of the triplet unless it's a triplet == 1 ?

Ballasi commented 7 months ago

I'd be honest by saying I don't speak Spanish so I can't get of much help.

What I recommend you to do though is to checkout savoirfairelinux's num2words implementation of Spanish in Python here: https://github.com/savoirfairelinux/num2words/blob/master/num2words/lang_ES.py

You can also download the app and run words on it by yourself to see how it renders, you can also take a look at their unit test suite.

I'd aim for implementing what this version is already doing!

Laifsyn commented 7 months ago

currently checking. I can't help but feel confused at the short variable names and multiple layers of inheritance.

I feel like when I saw C for the first time at university, but without any explicit type information lols

Edit: I gotta love codespaces. I just copy pasted all the setup instructions, and got something running in under 1 minute. so "801_100_001" is indeed translated as "ochocientos uno millones..." , however it personally feels weird the pronunciation (I currently considering if giving up on making sense what the huq was done in the python code)

This would actually be more of an issue with "Finding a expert in languages". But yeah. I'm pretty much spot on on the implementation result. Except for the 21-29 range which has a special case of how the words gets reduced

Ballasi commented 7 months ago

If you want you can also try to check other apps online that does the same. When doing the French language, I noticed Google did that so I tried multiple stuff on there and compared it with my results. I've also checked sites online that does the exact same.

If all of them yields the same output, it's probably best to do like them I believe.

To be honest I've learned rules for French by doing that exactly haha

Laifsyn commented 7 months ago

Or gate it behind regional flavour. Maybe the one that sounds more syntactically correct for ES_EU and the more fluent to me behind ES_LAN.

Ballasi commented 7 months ago

That's what I was wondering, but even savoirfairelinux's num2words doesn't do any difference between Spanish EU and LATAM EU (the only difference is currency names), so I am not sure exactly.

I will investigate that tomorrow.

Ballasi commented 7 months ago

Now that I think of it, I have a family member that works on/studies the Spanish language. I can probably ask her if there are any linguistic specificity or localization on Spanish.

Can you give me a clear example of a number that causes problem, what you expect the answer to be personally and what you seem to find online? Thank you!

Laifsyn commented 7 months ago

Issue 1:

When Triplet's index is bigger than 0 (triplet which comes from input bigger than 999, i.e. 122_000)
When Triplet's unit is 1, but in exception of 11 (i.e. 101, 121, 801) Examples of Numbers: 801_151_000 => "ochocientos y uno millones ciento cincuenta y uno mil" || "ochocientos y un millones ciento cincuenta y un mil"

Then there's also flavour to the first Milliard (Billion, Million, Trillion) which I'm not sure how to prove So I feel like the rule is, (in exception to thousand's milliard which doesn't have plurals unless it has an quantifier like dollars),

If the triplet ends in 1, then you grab the Singular form of the first milliard you populate, and then for the following milliards you have to use their plural form regardless of the triplet that's prepended to them

for example "181_400_700" Without flavouring would be "Ciento ochenta y un millones cuatroscientos mil setecientos" With the flavouring would be"Ciento ochenta y un millon cuatroscientos mil setecientos", which feels more fluent another example with the flavouring would be for "181_400_700_000" "Ciento ochenta y un billón cuatroscientos millones setecientos mil"

(At least, in Cardinal form) it's how I feel it is

And then there's also the thousand Milliard jumps that num2word of python uses

I usually do Billion to Trillion, but in python's implementation goes "Billion to Thousand Billion then Trillion and then Thousand Trillion" (I personally prefer to use the first version)

Laifsyn commented 7 months ago

Oh, btw. Can I get help to explain how exactly is the method "prefer" in Num2Words struct work like and how should I be using it? This design choice was quite unexpected for me tbh

Update: Only missing Integration code (implement Language Trait).

Laifsyn commented 7 months ago

Why is there no way to edit the contained BigFloat from the structure?

pub struct Num2Words {
    num: BigFloat, // I seem unable to change it (no accessors to mutate, nor inherit configurations)
    lang: Lang,
    output: Output,
    currency: Currency,
    preferences: Vec<String>,
}

Laifsyn commented 7 months ago

Update: I'm finding it extremely un-ergonomic to have my Num2Words being consumed everytime I convert a number.... I'm really curious why was this choice made

Ballasi / num2words

Spanish Support #27