Humanizr / Humanizer

Humanizer meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities
Other
8.68k stars 963 forks source link

Localize Pluralize/Singularize (WAS: Localizable InflectorExtensions) #197

Open kblok opened 10 years ago

kblok commented 10 years ago

I'd like to implement an Spanish implementation for the InflectorExtensions. I don't know if there is an ongoing work on this topic (the issue #132 is quite related to this)

I think we should have a culture specific provider responsible of filling the rules list and then simply (?) write regex rules for each language.

What do you think @MehdiK ?

MehdiK commented 10 years ago

I think by inflector methods you only mean Pluralize and Singularize here, right?

This is a great idea. We can extract the localizable logic out of the class, implement a default pluralizer/singularizer/inflector class with the current logic (excluding the rules) and provide hooks for injecting the rules etc; kinda like how NumberToWordsConverter is implemented.

What does the localisation committee think? /cc @harouny, @JonasJensen, @mexx, @mnowacki, @hazzik, @thunsaker, @henriksen, @ekblom, @akamud, @ignorkulman, @Borzoo, @onovotny

kblok commented 10 years ago

Yes, I'm talking about the Pluralize and Singularize feature. This is what I have in mind:

I think that the Spanish language has similar rules than the english language regarding singularity and plurality (It has uncountable, singular only, plural only and irregular words) so the behavior could be the same. Another language could choose between inherits from DefaultInflector or just implementing the IInflector interface.

I have my doubts if IInflector, DefaultInflector, EnglishInflector, etc should have some sufix (Provider? Engine?)

MehdiK commented 10 years ago

Before going too far, I'd like to confirm that it is actually possible to implement this logic in other languages too, either through changing the rules or implementation from scratch. Depending on the complexities of other languages we may have to choose a different design or think harder about this. Sometimes language rules get way too complex (#64)! I have considered creating a new Humanizer.Dictionary package that deals with this and other language specific word manipulations, and I still think that's a viable solution.

FWIW the English implementation is relatively buggy too. See #142 for more details.

akamud commented 10 years ago

After looking at the InflectorExtension implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex. The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etymology or word's accent. Making it impossible to predict what the correct plural form would be.

For example, there's a rule that says that words that end with "ão" will have "ões" in its plural form:

coração -> corações
cordão -> cordões

But there are some words that don't follow that rule:

órgão -> órgãos
alemão -> alemães
cão - cães

In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):

mão -> mãos
artesão -> artesãos

To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.

thunsaker commented 10 years ago

Spanish rules are similar, I tried to explain some of these with regard to the ordinals #212

On Mon, Apr 14, 2014 at 6:29 PM, Mahmoud Ali notifications@github.com wrote:

After looking at the InflectorExtension implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex. The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etimology or word's accent. Making it impossible to predict what the correct plural form would be. For example, there's a rule that says that words that ends with "ão" will have "ões" in its plural form:

coração -> corações
cordão -> cordões

But there are some words that don't follow that rule:

órgão -> órgãos
alemão -> alemães
cão - cães

In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):

mão -> mãos
artesão -> artesãos

To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.

Reply to this email directly or view it on GitHub: https://github.com/MehdiK/Humanizer/issues/197#issuecomment-40436798

kblok commented 10 years ago

My concern with dictionaries is the impact they could have in terms of the "weight" of the library (I think it could be solved with resources) and performance (I should also be worried about the performance with so many regex the lib is evaluating right now).

Another think with dictionary is maintenance, where will we easily get a list of singular and plurals? I don't know if it easy to get, at least for the Spanish language.

kblok commented 10 years ago

BTW @thunsaker I have this link with rules for plurals (spanish) http://es.m.wikibooks.org/wiki/Espa%C3%B1ol/Morfolog%C3%ADa/Sustantivo

mexx commented 10 years ago

For Russian there is an extra grammatical number present. In the current implementation it is named Paucal, actually it is a kind of Dual. For now I have no elegant solution to support this distinction in the Inflector scenario.

In German it would be possible to go with the injection of the rules, as German as English also have only two grammatical numbers.

hazzik commented 10 years ago

@mexx, paucal is usually not a number, but a genitive case in Russian.

hazzik commented 10 years ago

I think we need to properly implement GrammaticalNumberDetector for all languages and widely use it.

hazzik commented 10 years ago

I'm thinking about interface IQuantifiable { ToQuantity(int number); } or IWord, which can implement language specific logic of quantification. What do you think? The concept similar to this was already used in DutchNumberToWordsConverter

Borzoo commented 10 years ago

@hazzik, this idea was implemented in #285 but we need a better design to convert singulars to plurals and duals and vice versa. I'm trying to come up with an elegant solution that supports singulars, duals, paucals(if needed) and plurals.

hazzik commented 10 years ago

@Borzoo, the thing implemented in #285 is something different. There is IQuantifier, which can quantify any word, but I propose that word itself can have different representations.

5cover commented 1 year ago

Has any progress been made on this issue?