Open kblok opened 10 years ago
I think by inflector methods you only mean Pluralize
and Singularize
here, right?
This is a great idea. We can extract the localizable logic out of the class, implement a default pluralizer/singularizer/inflector class with the current logic (excluding the rules) and provide hooks for injecting the rules etc; kinda like how NumberToWordsConverter
is implemented.
What does the localisation committee think? /cc @harouny, @JonasJensen, @mexx, @mnowacki, @hazzik, @thunsaker, @henriksen, @ekblom, @akamud, @ignorkulman, @Borzoo, @onovotny
Yes, I'm talking about the Pluralize and Singularize feature. This is what I have in mind:
I think that the Spanish language has similar rules than the english language regarding singularity and plurality (It has uncountable, singular only, plural only and irregular words) so the behavior could be the same. Another language could choose between inherits from DefaultInflector or just implementing the IInflector interface.
I have my doubts if IInflector, DefaultInflector, EnglishInflector, etc should have some sufix (Provider? Engine?)
Before going too far, I'd like to confirm that it is actually possible to implement this logic in other languages too, either through changing the rules or implementation from scratch. Depending on the complexities of other languages we may have to choose a different design or think harder about this. Sometimes language rules get way too complex (#64)! I have considered creating a new Humanizer.Dictionary
package that deals with this and other language specific word manipulations, and I still think that's a viable solution.
FWIW the English implementation is relatively buggy too. See #142 for more details.
After looking at the InflectorExtension
implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex.
The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etymology or word's accent. Making it impossible to predict what the correct plural form would be.
For example, there's a rule that says that words that end with "ão" will have "ões" in its plural form:
coração -> corações
cordão -> cordões
But there are some words that don't follow that rule:
órgão -> órgãos
alemão -> alemães
cão - cães
In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):
mão -> mãos
artesão -> artesãos
To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.
Spanish rules are similar, I tried to explain some of these with regard to the ordinals #212
On Mon, Apr 14, 2014 at 6:29 PM, Mahmoud Ali notifications@github.com wrote:
After looking at the
InflectorExtension
implementation I can say this implementation would work with Portuguese. The "normal" rules aren't too complex. The problem with the plural for portuguese is that, although it may look simple, its exception rules depend on Etimology or word's accent. Making it impossible to predict what the correct plural form would be. For example, there's a rule that says that words that ends with "ão" will have "ões" in its plural form:coração -> corações cordão -> cordões
But there are some words that don't follow that rule:
órgão -> órgãos alemão -> alemães cão - cães
In some cases this rule changes because the accentuated syllable is not the last one. But some words won't even follow this rule (and as far as I know, there is no rule for these kind of words):
mão -> mãos artesão -> artesãos
To ensure a more accurate translation we will indeed need a dictionary. Probably something similar happens in English and Spanish.
Reply to this email directly or view it on GitHub: https://github.com/MehdiK/Humanizer/issues/197#issuecomment-40436798
My concern with dictionaries is the impact they could have in terms of the "weight" of the library (I think it could be solved with resources) and performance (I should also be worried about the performance with so many regex the lib is evaluating right now).
Another think with dictionary is maintenance, where will we easily get a list of singular and plurals? I don't know if it easy to get, at least for the Spanish language.
BTW @thunsaker I have this link with rules for plurals (spanish) http://es.m.wikibooks.org/wiki/Espa%C3%B1ol/Morfolog%C3%ADa/Sustantivo
For Russian there is an extra grammatical number present. In the current implementation it is named Paucal
, actually it is a kind of Dual
. For now I have no elegant solution to support this distinction in the Inflector
scenario.
In German it would be possible to go with the injection of the rules, as German as English also have only two grammatical numbers.
@mexx, paucal is usually not a number, but a genitive case in Russian.
I think we need to properly implement GrammaticalNumberDetector
for all languages and widely use it.
I'm thinking about interface IQuantifiable { ToQuantity(int number); }
or IWord
, which can implement language specific logic of quantification. What do you think? The concept similar to this was already used in DutchNumberToWordsConverter
@hazzik, this idea was implemented in #285 but we need a better design to convert singulars to plurals and duals and vice versa. I'm trying to come up with an elegant solution that supports singulars, duals, paucals(if needed) and plurals.
@Borzoo, the thing implemented in #285 is something different. There is IQuantifier, which can quantify any word, but I propose that word itself can have different representations.
Has any progress been made on this issue?
I'd like to implement an Spanish implementation for the InflectorExtensions. I don't know if there is an ongoing work on this topic (the issue #132 is quite related to this)
I think we should have a culture specific provider responsible of filling the rules list and then simply (?) write regex rules for each language.
What do you think @MehdiK ?