denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
2.98k stars 595 forks source link

feat request: add a pluralize word function #5423

Open lowlighter opened 1 month ago

lowlighter commented 1 month ago

Is your feature request related to a problem? Please describe.

Pluralizing english words is useful for many things:

The pluralize has around 7-8 millions weekly download and around 3 millions dependant on GitHub

The mentionned package also support singularization, which could be also useful to map back stuff, although it might be less used.

Describe the solution you'd like

Something similar to pluralize in std

Describe alternatives you've considered

Using either the npm pluralize package or a custom function (not ideal) like


function pluralize(word: string) {
  if (/(?:s|x|ch|sh)$/.test(word)) {
    return `${word}es`
  }
  if (/(?:[^aeiou])y$/.test(word)) {
    return `${word.slice(0, -1)}ies`
  }
  if (/o$/.test(word)) {
    return `${word}es`
  }
  if (/f$/.test(word)) {
    return `${word.slice(0, -1)}ves`
  }
  return `${word}s`
}
kt3k commented 1 month ago

I think the tricky part of this is how to maintain the list of irregular words. Is there a good reference we can rely on?

lowlighter commented 1 month ago

The noun.exc from Wordnet 3.1 database only files contains an extensive list of irregular plural nouns (which I think could be further reduced because some follows "common" rules (i.e. werewolf → werewolves where f → ves) that could be used as reference

Along with Wiktionary explanation of pluralization and its irregular plurals english nouns I feel like it's a good starting point.


Also I want to add that these kind of functions (pluralize/singularize) are also sometimes used in lemmatization when performing NLP (though I don't know whether it's still as popular with LLM).

It can also be processed with normalization, where all diacritic (accents) are removed. I think the latter would be a nice addition (and not necessary too difficult to implement) because it's not uncommon for people making simple engine search to want to have this (e.g. treat pokémon → pokemon the same, useful for filtering)

andrewthauer commented 1 month ago

A good reference implementation is the inflector / inflections APIs in ruby on rails. It handles the inverse singularize and even more. I personally haven't dug into the code and how it works, but it's been around for ages and widely used for this sort of thing (in ruby).

There are some JS equivalents, some of which use the term inflect as well. They often include a variety of functions to translate strings from one for to another. For example camelCase, snake_case, Titleize, puralize, singularize, capitalize, etc.

NOTE: I was actually looking for something something to do inflections the other day in Deno. It ended up being a more simple case and I used a few functions from lodash. Having a inflection like functions in @std would come in handy imo.

^^ Actually it was there in text, but JSR search is pretty lacking atm and I didn't think to search the GH code in the moment.

lionel-rowe commented 1 month ago

As with https://github.com/denoland/std/issues/5424, I think there are two use cases with minimal overlap here — dev-first and user-first. User-first probably isn't something that should be handled by std as it quickly becomes hugely complex if you want to support more than English — many languages have >2 plural forms — and overlaps with other aspects of NLP (stemming/lemmatization) as people have mentioned. You're much better off using dedicated localization or NLP libraries for that kind of thing.

The dev-first version, i.e. class Item {} => /api/items etc and focusing solely on English, is maybe more feasible, though you'll still need to make some arbitrary decisions (indexes vs indices etc). Similar to capitalization, you could even make it TypeScript friendly with a bit of effort. In fact, I have a strongly typed version I'm currently working on and have used it a bit in my own code:

const x: 'cities' = pluralize('city')
const y: 'city' = singularize(x)

But maybe this kind of thing is a bit hacky for inclusion in std.