Open Laifsyn opened 7 months ago
Update: I think I managed to properly convert any integer up to billion(for now)
Hey, sorry for the late reply, for some reason the new issue didn't send anything in my GitHub notifications center, not sure why. Thanks for being interested in contributing!
I guess there isn't much to know as long as you follow the way the program works (so in the end, support ordinal/cardinal/ordinal_num/year, currencies, ... / use preferences methods in case Spanish have anything really specific to the language). Also, I feel like it is very important to add a bunch of unit tests to make sure everything is fine. Don't hesitate to write them as early as possible, possibly before even implementing the language itself. I have found multiple quirks by doing so.
If you don't feel like having ordinal/ordinal_num/year/currencies supported for now, that's fine! It's important to go through it step by step. I could probably take a look at that on my side! It's already very generous to offer to implement a language!
Once you feel like your changes are ready, make a PR, and I'll happily discuss the changes with you :) I am used to doing a few back and forth so don't expect any big changes like this to be instantly merged, just want to make sure everything is perfect!
If you do want to know what needs to be added for a language, I recommend you take a look at the Ukrainian PR #19 and the quick change I added afterwards 93ef526.
Please don't hesitate to reach out to me if you have any questions!
I just noticed, but is it okay to add files of my own while I write the fork up? Because I can easily delete these unrelated files berfore submitting the pull request, but they will appear in the git history because the pull request seem to not squash all commits. Or how should I do for this case?
For whoever stumbles upon this here and knows better.. which is more correct to say? for 1_001_700 "un millón mil setecientos" or "un millón un mil setecientos"
Also: for 1_700 "mil setecientos" or "un mil setecientos"
I just noticed, but is it okay to add files of my own while I write the fork up? Because I can easily delete these unrelated files berfore submitting the pull request, but they will appear in the git history because the pull request seem to not squash all commits. Or how should I do for this case?
I usually apply PR with a squash method, so it should remove it altogether. Otherwise, I'll do it by hand. If you can do the squash by yourself before publishing the patch, it would be best!
Also: for 1_700 "mil setecientos" or "un mil setecientos"
According to http://www.intro2spanish.com/vocabulary/numbers/advanced.htm you need "un" for numbers between 1000 and 1099 :)
I usually apply PR with a squash method, so it should remove it altogether. Otherwise, I'll do it by hand. If you can do the squash by yourself before publishing the patch, it would be best!
I guess I'll figure it out when it comes the time then
According to http://www.intro2spanish.com/vocabulary/numbers/advanced.htm you need "un" for numbers between 1000 and 1099 :)
I think, but it still sounds more natural without "un" for me and my friends
Also for cases like this 801_100_001, is it "ochocientos uno millones..." or "ochocientos un millón"?
when it is the first triplet, does it grab the singular form of the Milliards, and additionally takes the expression of singular "un" (1), or it should take the plural form of milliards regardless the value or position of the triplet unless it's a triplet == 1
?
I'd be honest by saying I don't speak Spanish so I can't get of much help.
What I recommend you to do though is to checkout savoirfairelinux's num2words implementation of Spanish in Python here: https://github.com/savoirfairelinux/num2words/blob/master/num2words/lang_ES.py
You can also download the app and run words on it by yourself to see how it renders, you can also take a look at their unit test suite.
I'd aim for implementing what this version is already doing!
currently checking. I can't help but feel confused at the short variable names and multiple layers of inheritance.
I feel like when I saw C for the first time at university, but without any explicit type information lols
Edit: I gotta love codespaces. I just copy pasted all the setup instructions, and got something running in under 1 minute. so "801_100_001" is indeed translated as "ochocientos uno millones..." , however it personally feels weird the pronunciation (I currently considering if giving up on making sense what the huq was done in the python code)
This would actually be more of an issue with "Finding a expert in languages". But yeah. I'm pretty much spot on on the implementation result. Except for the 21-29 range which has a special case of how the words gets reduced
If you want you can also try to check other apps online that does the same. When doing the French language, I noticed Google did that so I tried multiple stuff on there and compared it with my results. I've also checked sites online that does the exact same.
If all of them yields the same output, it's probably best to do like them I believe.
To be honest I've learned rules for French by doing that exactly haha
Or gate it behind regional flavour. Maybe the one that sounds more syntactically correct for ES_EU and the more fluent to me behind ES_LAN.
That's what I was wondering, but even savoirfairelinux's num2words doesn't do any difference between Spanish EU and LATAM EU (the only difference is currency names), so I am not sure exactly.
I will investigate that tomorrow.
Now that I think of it, I have a family member that works on/studies the Spanish language. I can probably ask her if there are any linguistic specificity or localization on Spanish.
Can you give me a clear example of a number that causes problem, what you expect the answer to be personally and what you seem to find online? Thank you!
Issue 1:
Then there's also flavour to the first Milliard (Billion, Million, Trillion) which I'm not sure how to prove So I feel like the rule is, (in exception to thousand's milliard which doesn't have plurals unless it has an quantifier like dollars),
for example "181_400_700" Without flavouring would be "Ciento ochenta y un millones cuatroscientos mil setecientos" With the flavouring would be"Ciento ochenta y un millon cuatroscientos mil setecientos", which feels more fluent another example with the flavouring would be for "181_400_700_000" "Ciento ochenta y un billón cuatroscientos millones setecientos mil"
(At least, in Cardinal form) it's how I feel it is
And then there's also the thousand Milliard jumps that num2word of python uses
I usually do Billion to Trillion, but in python's implementation goes "Billion to Thousand Billion then Trillion and then Thousand Trillion" (I personally prefer to use the first version)
Oh, btw. Can I get help to explain how exactly is the method "prefer" in Num2Words struct work like and how should I be using it? This design choice was quite unexpected for me tbh
Update: Only missing Integration code (implement Language Trait).
Why is there no way to edit the contained BigFloat from the structure?
pub struct Num2Words {
num: BigFloat, // I seem unable to change it (no accessors to mutate, nor inherit configurations)
lang: Lang,
output: Output,
currency: Currency,
preferences: Vec<String>,
}
Update: I'm finding it extremely un-ergonomic to have my Num2Words being consumed everytime I convert a number.... I'm really curious why was this choice made
Hello. I might try add spanish support. I haven't properly fully read how other lang's implementations are, but is there anything I should know? Or should wait till I submit pull request first?
Also, since it is spanish impl I was thinking about using spanish names for statics, and possibly even variables as well. as for methods I guess can leave them in english.
Also I might not be able to implement the Currency and date and others (Only cardinal if I didn't misinterpret it) since they feel a bit out of my league.... at least not at first!