MycroftAI / lingua-franca

Mycroft's multilingual text parsing and formatting library
Apache License 2.0
75 stars 78 forks source link

new utils - singularize and pluralize #35

Open JarbasAl opened 4 years ago

JarbasAl commented 4 years ago

i propose two new methods, useful for normalization but also for generating more natural dialog

singularize -> makes word singular pluralize -> makes word plural form

these would be localized per language

krisgesling commented 4 years ago

Can definitely think of good use cases eg a shopping cart / shopping list:

User: "buy an apple" User: "buy another apple" Mycroft: "getting two apples"

and it's something that I've seen people attempt to do in code which is both bad for internationalisation and hard to do properly eg plural_word != word + 's'

filips123 commented 3 years ago

I think using separate singularize and pluralize is bad. Although English only has two forms (singular and plural), some languages (for example Slovenian) have multiple forms. Using separate functions prevents properly implementing pluralization support for such languages.

Instead, there should be just a single function that accepts word and number. This function should then, based on pluralization rules for that specific language, get which plural category should be used and return proper form of that word. I think it would make sense to implement this in two functions (word is the word to pluralize, number is the ammount and type is cardinal, ordinal or range), plural_category(number, type) and plural(word, number, type), so plural_category can also be used in other places, for example in programs that have all forms defined and just want to choose the correct one. This would also solve issue with duration formatting.

Unicode CLDR defines few categories that can cover most of the languages and rules when to use them for each language:

This means that each language should define plural_category takes the number and type and return one of those categories. Each language should also have plural, that will directly take word and pluralize it based on category from the other function.

Similar approach is also used in gettext and date-fns.

JarbasAl commented 3 years ago

great feedback @filips123 , that does make sense!

filips123 commented 3 years ago

I will add those functions and implement for getting plural categories in English and Slovenian. However, where should I put those functions (for start plural_category and later plural)? Maybe I should add a new module utils or should they be somewhere in the existing modules?

ChanceNCounter commented 3 years ago

@filips123 adding to LF is much easier post-refactor. Check https://github.com/MycroftAI/lingua-franca/blob/master/project-structure.md and feel free to ask questions.

In short, new functions go in the relevant module (usually parse.py or format.py). If they're localized, use the decorator, place the localized versions in the corresponding files in lang/<lang_code>/xxx.py according to the instructions in project-structure.md, and name them the same way your other functions are named.