atteo / evo-inflector

Singular to plural english word converter
Apache License 2.0
341 stars 44 forks source link

don't try to pluralize plural words #6

Open SingingBush opened 9 years ago

SingingBush commented 9 years ago

this library is almost what I need although I wish it could also return the singular versions of plural words. Shouldn't be too hard. I did find an issue though. When passed a word that is already plural, it tries to pluralize it instead of just returning the same string. This is a problem as I'm handling user input and will have no idea if the users have typed in singular or plural to start with.

Ideally I'd want a method that returns a plural if a singular has been input and a singular if a plural has been input.

btw, on the atteo.org page the xml block for maven needs editing, groupid should have an uppercase 'I'. it will cause maven to fail.

sentinelt commented 9 years ago

I changed to uppercase 'I' on atteo.org page. Thanks for that.

When passed a word that is already plural, it tries to pluralize it instead of just returning the same string.

Sometimes the expected behavior would be to try to pluralize the word even it is already plural. So instead of changing the semantics I think it would be better to add a more general method - the one which will allow to check whether the word is plural or singular:

English.isPlural(String)

But this is non trivial functionality. If you know any materials which describe how to pragmatically find out whether the word is plural or not, or you know any libraries in other languages which do that, please let me know.

For plural to singular mapping please open a separate feature request.

SingingBush commented 9 years ago

to do what I needed I used some code from here combined with your own English.plural() method:

public class WordMagic {

    private static final List<String> UNCOUNTABLES = Arrays.asList(new String[]{"equipment", "information", "rice", "money", "species", "series", "fish", "sheep"});

    private LinkedList<Rule> _singulars = new LinkedList<Rule>();

    public WordMagic() {
        addSingularizeRules();
    }

    /**
     * For a given word, return either the singular or the plural version
     * @param word the term that needs checking
     * @return either the singular or the plural version
     */
    public String calculateSingularOrPlural(final String word) {
        if (isUncountable(word)) return word;

        for (final Rule rule : _singulars) {
            final String result = rule.apply(word);
            if (result != null) return result;
        }
        // if no singular was found we'll assume that the word is already singular and can be safely pluralised.
        // English.plural() will always pluralise, even if it's already plural!!!
        return English.plural(word.trim());
    }

    private void addSingularize(final String rule, final String replacement) {
        final Rule singularizeRule = new Rule(rule, replacement);
        _singulars.addFirst(singularizeRule);
    }

    private void addSingularizeRules() {
        addSingularize("s$", "");
        addSingularize("(s|si|u)s$", "$1s"); // '-us' and '-ss' are already singular
        addSingularize("(n)ews$", "$1ews");
        addSingularize("([ti])a$", "$1um");
        addSingularize("((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$", "$1$2sis");
        addSingularize("(^analy)ses$", "$1sis");
        addSingularize("(^analy)sis$", "$1sis"); // already singular, but ends in 's'
        addSingularize("([^f])ves$", "$1fe");
        addSingularize("(hive)s$", "$1");
        addSingularize("(tive)s$", "$1");
        addSingularize("([lr])ves$", "$1f");
        addSingularize("([^aeiouy]|qu)ies$", "$1y");
        addSingularize("(s)eries$", "$1eries");
        addSingularize("(m)ovies$", "$1ovie");
        addSingularize("(x|ch|ss|sh)es$", "$1");
        addSingularize("([m|l])ice$", "$1ouse");
        addSingularize("(bus)es$", "$1");
        addSingularize("(o)es$", "$1");
        addSingularize("(shoe)s$", "$1");
        addSingularize("(cris|ax|test)is$", "$1is"); // already singular, but ends in 's'
        addSingularize("(cris|ax|test)es$", "$1is");
        addSingularize("(octop|vir)i$", "$1us");
        addSingularize("(octop|vir)us$", "$1us"); // already singular, but ends in 's'
        addSingularize("(alias|status)es$", "$1");
        addSingularize("(alias|status)$", "$1"); // already singular, but ends in 's'
        addSingularize("^(ox)en", "$1");
        addSingularize("(vert|ind)ices$", "$1ex");
        addSingularize("(matr)ices$", "$1ix");
        addSingularize("(quiz)zes$", "$1");
    }

    private boolean isUncountable(final String word) {
        return StringUtils.isEmpty(word)? false : UNCOUNTABLES.contains(word.trim().toLowerCase());
    }

    private class Rule {
        private final String _expression;
        private final Pattern _expressionPattern;
        private final String _replacement;

        protected Rule(final String expression, final String replacement) {
            _expression = expression;
            _replacement = replacement != null ? replacement : "";
            _expressionPattern = Pattern.compile(_expression, Pattern.CASE_INSENSITIVE);
        }

        /**
         * Apply the rule against the input string, returning the modified string or null if the rule didn't apply (and no
         * modifications were made)
         *
         * @param input the input string
         * @return the modified string if this rule applied, or null if the input was not modified by this rule
         */
        protected String apply(final String input) {
            final Matcher matcher = _expressionPattern.matcher(input);
            return matcher.find() ? matcher.replaceAll(_replacement) : null;
        }
    }
}

Not sure if that's any help. I'm just hacking something together for a prototype.

eepstein commented 7 years ago

In line with the issue's title (but not its thrust), the library incorrectly "pluralizes" preferences to preferenceses. hmmm