Open lowlighter opened 2 months ago
Maybe let's add toTitleCase()
? That was suggested in the past in #3440 and #4082
Maybe let's add
toTitleCase()
? That was suggested in the past in #3440 and #4082
Implementing toTitleCase()
is a slippery slope, because it needs grammar analysis (and possibly localization) for a proper implementation. This has also been discussed as a native api proposal, but seems like they haven't reached a conclusion (refs: https://github.com/denoland/deno_std/pull/4082#issuecomment-1883017296, https://es.discourse.group/t/proposal-string-prototype-capitalize/1662 and https://es.discourse.group/t/proposal-string-prototype-capitalize/1662).
I would like to have toTitleCase()
implemented in std, but I think this will take a huge effort to do right.
Maybe I misdirected the issue, I wanted to have an alias for ucfirst
(i.e. just capitalize the first letter of a string), just to avoid having to do the ${str.charAt(0).toLocaleUpperCase()}${str.substring(1)}
each time
The String.prototype.capitalize
and toTitleCase
suggested seems to be more akin to ucwords
function.
Also just noticed that the function linked in the original post is actually not exactly what I wanted, the toLocaleLowerCase()
isn't supposed to be called for the rest of the string in ucfirst
Sorry if the issue wasn't clear
@timreichen @kt3k
I think that the addition of toTitleCase()
might be a good idea but probably making capitalizeWord public would be a better fit for this issue since it's a real use case (sometimes you just want the first letter of the string to be capitalized)
Sounds like capitalizeWord
is a good starting point? Let's document that the API only upper case the first letter and lower case the latter, and there's no grammatical analysis performed in it.
I think capitalizeWord
is a sensible enough addition, but putting more complex letter casing functions inside of std
runs risk of massive scope creep unless you want to unduly privilege English over every other language. Still, you could easily build a "naive" title-case in userland on top of capitalizeWord
+ Intl.Segmenter(locale, { granularity: 'word' })
.
Even with capitalizeWord
alone there are a few non-trivial considerations:
/\p{L}/u
Implementation could look something like this:
type CapitalizeWordOptions = {
locale: string | Intl.Locale
force: boolean
}
const defaults: CapitalizeWordOptions = {
locale: 'en-US',
force: false,
}
function capitalizeWord(word: string, options?: Partial<CapitalizeWordOptions>): string {
const { locale, force } = { ...defaults, ...options }
for (const { segment: grapheme, index } of new Intl.Segmenter(locale, { granularity: 'grapheme' }).segment(word)) {
if (/\p{L}/u.test(grapheme)) {
const before = word.slice(0, index)
const after = word.slice(index + grapheme.length)
const afterModified = force ? after.toLocaleLowerCase(locale) : after
return before + grapheme.toLocaleUpperCase(locale) + afterModified
}
}
return word
}
I got some reference in the ucfirst
function that the issue talks about, and I think that we should go with the simple approach. Just capitalize the first letter of the string (ranging from a to z) like the PHP docs say. I'll leave some reference on it in this comment.
https://www.php.net/manual/en/function.ucfirst.php https://docs.rs/ucfirst/latest/ucfirst/ https://perldoc.perl.org/functions/ucfirst
I think that what @lionel-rowe said falls more on the description of the 'toTitleCase()' function, which I agree with @timreichen that it would be a great effort to do so by now. With the ucfirst
based implementation we would be favoring the Latin alphabet, but I think that's ok for now.
I think that what @lionel-rowe said falls more on the description of the 'toTitleCase()' function, which I agree with @timreichen that it would be a great effort to do so by now
I'd suggest title casing is something that should permanently fall outside the scope of std
, as proper dedicated libraries would handle it better. Otherwise it'd involve maintaining a list of "stop words" that shouldn't be capitalized for every supported language, and that's even without considering the various differing standards that exist (APA, AP, Chicago, etc.)
The reason I think capitalizing a single word could reasonably fall within the scope of std
is that it's relatively speaking very simple to do in a reasonably robust, locale-aware way and doesn't require any hard-coded word lists.
With that said, I think it's worth distinguishing between a "dev-first" and a "user-first" approach to capitalization:
The "dev-first" approach can be relatively simple and can be used for cases such as code generation, dev tooling, etc. The current version in text/_util.ts
and PHP's ucfirst
both fall in this category. IMO neither of these implementations are great — ucfirst
is extremely limited as it only handles ASCII (note that strings such as переменная
and μεταβλητός
are perfectly valid identifiers in JS), and text/_util.ts::capitalizeWord
may give different results on different systems due to calling toLocale[Upper/Lower]Case
with no locale specified.
One nice DX enhancement you could do with the dev-first approach is replicate TS's implementation of the Capitalize
utility type so you get type inference for free:
function capitalize<T extends string>(str: T): Capitalize<T> {
return str.charAt(0).toUpperCase() + str.slice(1) as Capitalize<T>
}
const capitalized: 'Foo' = capitalize('foo')
The "user-first" approach is for user-facing text and is locale aware. My implementation above is an example of this approach. Runtime implementation-wise it's slightly more complicated, whereas type-wise it's very simple, as there's no easy way to represent the return type in TypeScript other than string
.
Generally speaking, dev-first capitalization is only for dev-centric use cases and should be avoided for user-facing text. On the other hand, the user-first approach can be used for both user-facing and dev-facing purposes, but in dev-facing scenarios you wouldn't get the type inference.
Further to that, looking at the usage of the capitalizeWord
util, it seems pretty clear that to_[camel/kebab/pascal/snake]_case
are dev-first functions, but with the notable drawback of using toLocale[Upper/Lower]Case
without specifying a locale. I think the suitable use cases should be documented in the case of to_capitalized
or whatever the equivalent public function would be called, as devs may have a reasonable expectation that it's a general-purpose function suitable for user-facing text (whereas that confusion is unlikely with camel/kebab/etc.)
Maybe let's add
toTitleCase()
? That was suggested in the past in #3440 and #4082Implementing
toTitleCase()
is a slippery slope, because it needs grammar analysis (and possibly localization) for a proper implementation. This has also been discussed as a native api proposal, but seems like they haven't reached a conclusion (refs: #4082 (comment), https://es.discourse.group/t/proposal-string-prototype-capitalize/1662 and https://es.discourse.group/t/proposal-string-prototype-capitalize/1662).I would like to have
toTitleCase()
implemented in std, but I think this will take a huge effort to do right.
I suspect subtle/context-sensitive grammatical operations like title casing will be mostly done via cheap-end llms where possible.
Is your feature request related to a problem? Please describe.
Expose publicly
capitalizeWord
or something similar (maybecapitalize
/ucfirst
/titleCase
/ ...): https://github.com/denoland/deno_std/blob/22d3bda488f145b725fc1eaeee16922a97d88add/text/_util.ts#L9-L13A lot of languages provides a
ucfirst
helper (e.g. php, perl, etc.) that capitalize the first letter of a string.While this is trivial enough, it's often tedious to have to redefine this function in every project when needed
Describe the solution you'd like
Feature offered by std lib
Describe alternatives you've considered
Redefining this the
capitalizeWord
in own project