denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
3k stars 598 forks source link

feat request: expose publicly `capitalizeWord` (or similar) #5424

Open lowlighter opened 2 months ago

lowlighter commented 2 months ago

Is your feature request related to a problem? Please describe.

Expose publicly capitalizeWord or something similar (maybe capitalize / ucfirst / titleCase / ...): https://github.com/denoland/deno_std/blob/22d3bda488f145b725fc1eaeee16922a97d88add/text/_util.ts#L9-L13

A lot of languages provides a ucfirst helper (e.g. php, perl, etc.) that capitalize the first letter of a string.

While this is trivial enough, it's often tedious to have to redefine this function in every project when needed

Describe the solution you'd like

Feature offered by std lib

Describe alternatives you've considered

Redefining this the capitalizeWord in own project

kt3k commented 2 months ago

Maybe let's add toTitleCase()? That was suggested in the past in #3440 and #4082

timreichen commented 2 months ago

Maybe let's add toTitleCase()? That was suggested in the past in #3440 and #4082

Implementing toTitleCase() is a slippery slope, because it needs grammar analysis (and possibly localization) for a proper implementation. This has also been discussed as a native api proposal, but seems like they haven't reached a conclusion (refs: https://github.com/denoland/deno_std/pull/4082#issuecomment-1883017296, https://es.discourse.group/t/proposal-string-prototype-capitalize/1662 and https://es.discourse.group/t/proposal-string-prototype-capitalize/1662).

I would like to have toTitleCase() implemented in std, but I think this will take a huge effort to do right.

lowlighter commented 2 months ago

Maybe I misdirected the issue, I wanted to have an alias for ucfirst (i.e. just capitalize the first letter of a string), just to avoid having to do the ${str.charAt(0).toLocaleUpperCase()}${str.substring(1)} each time

The String.prototype.capitalize and toTitleCase suggested seems to be more akin to ucwords function.

Also just noticed that the function linked in the original post is actually not exactly what I wanted, the toLocaleLowerCase() isn't supposed to be called for the rest of the string in ucfirst

Sorry if the issue wasn't clear

luk3skyw4lker commented 1 month ago

@timreichen @kt3k

I think that the addition of toTitleCase() might be a good idea but probably making capitalizeWord public would be a better fit for this issue since it's a real use case (sometimes you just want the first letter of the string to be capitalized)

kt3k commented 1 month ago

Sounds like capitalizeWord is a good starting point? Let's document that the API only upper case the first letter and lower case the latter, and there's no grammatical analysis performed in it.

lionel-rowe commented 1 month ago

I think capitalizeWord is a sensible enough addition, but putting more complex letter casing functions inside of std runs risk of massive scope creep unless you want to unduly privilege English over every other language. Still, you could easily build a "naive" title-case in userland on top of capitalizeWord + Intl.Segmenter(locale, { granularity: 'word' }).

Even with capitalizeWord alone there are a few non-trivial considerations:

Implementation could look something like this:

type CapitalizeWordOptions = {
    locale: string | Intl.Locale
    force: boolean
}

const defaults: CapitalizeWordOptions = {
    locale: 'en-US',
    force: false,
}

function capitalizeWord(word: string, options?: Partial<CapitalizeWordOptions>): string {
    const { locale, force } = { ...defaults, ...options }

    for (const { segment: grapheme, index } of new Intl.Segmenter(locale, { granularity: 'grapheme' }).segment(word)) {
        if (/\p{L}/u.test(grapheme)) {
            const before = word.slice(0, index)
            const after = word.slice(index + grapheme.length)
            const afterModified = force ? after.toLocaleLowerCase(locale) : after

            return before + grapheme.toLocaleUpperCase(locale) + afterModified
        }
    }

    return word
}
luk3skyw4lker commented 1 month ago

I got some reference in the ucfirst function that the issue talks about, and I think that we should go with the simple approach. Just capitalize the first letter of the string (ranging from a to z) like the PHP docs say. I'll leave some reference on it in this comment.

https://www.php.net/manual/en/function.ucfirst.php https://docs.rs/ucfirst/latest/ucfirst/ https://perldoc.perl.org/functions/ucfirst

I think that what @lionel-rowe said falls more on the description of the 'toTitleCase()' function, which I agree with @timreichen that it would be a great effort to do so by now. With the ucfirst based implementation we would be favoring the Latin alphabet, but I think that's ok for now.

lionel-rowe commented 1 month ago

I think that what @lionel-rowe said falls more on the description of the 'toTitleCase()' function, which I agree with @timreichen that it would be a great effort to do so by now

I'd suggest title casing is something that should permanently fall outside the scope of std, as proper dedicated libraries would handle it better. Otherwise it'd involve maintaining a list of "stop words" that shouldn't be capitalized for every supported language, and that's even without considering the various differing standards that exist (APA, AP, Chicago, etc.)

The reason I think capitalizing a single word could reasonably fall within the scope of std is that it's relatively speaking very simple to do in a reasonably robust, locale-aware way and doesn't require any hard-coded word lists.

With that said, I think it's worth distinguishing between a "dev-first" and a "user-first" approach to capitalization:

Generally speaking, dev-first capitalization is only for dev-centric use cases and should be avoided for user-facing text. On the other hand, the user-first approach can be used for both user-facing and dev-facing purposes, but in dev-facing scenarios you wouldn't get the type inference.

lionel-rowe commented 1 month ago

Further to that, looking at the usage of the capitalizeWord util, it seems pretty clear that to_[camel/kebab/pascal/snake]_case are dev-first functions, but with the notable drawback of using toLocale[Upper/Lower]Case without specifying a locale. I think the suitable use cases should be documented in the case of to_capitalized or whatever the equivalent public function would be called, as devs may have a reasonable expectation that it's a general-purpose function suitable for user-facing text (whereas that confusion is unlikely with camel/kebab/etc.)

guy-borderless commented 1 month ago

Maybe let's add toTitleCase()? That was suggested in the past in #3440 and #4082

Implementing toTitleCase() is a slippery slope, because it needs grammar analysis (and possibly localization) for a proper implementation. This has also been discussed as a native api proposal, but seems like they haven't reached a conclusion (refs: #4082 (comment), https://es.discourse.group/t/proposal-string-prototype-capitalize/1662 and https://es.discourse.group/t/proposal-string-prototype-capitalize/1662).

I would like to have toTitleCase() implemented in std, but I think this will take a huge effort to do right.

I suspect subtle/context-sensitive grammatical operations like title casing will be mostly done via cheap-end llms where possible.