golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.06k stars 17.68k forks source link

strings, bytes: deprecate Title #48367

Closed smasher164 closed 3 years ago

smasher164 commented 3 years ago

Packages bytes and strings define the method Title:

func Title(s []byte) []byte
func Title(s string) string

[that] returns a copy with all Unicode letters that begin words mapped to their title case.

A known bug with this function is that it doesn't handle unicode punctuation properly (see https://github.com/golang/go/issues/6801). However, this can't be addressed due to the backwards compatibility guarantee, since it would change the function's output. This means that issues like https://github.com/golang/go/issues/34994 and https://github.com/golang/go/issues/48356 can't be addressed, which want Title to work around punctuation and emoji.

Furthermore, Title doesn't take into account language-specific capitalization rules. For example, the Dutch word "ijsland" should be capitalized as "IJsland", but Title returns "Ijsland".

A more robust alternative is golang.org/x/text/cases. Calling cases.Title(<language>).Bytes(<bytes>) or cases.Title(<language>).String(<string>) will deal with punctuation, different languages, and capitalize phrases.

Therefore, I propose deprecating bytes.Title and strings.Title in favor of golang.org/x/text/cases.

// Title treats s as UTF-8-encoded bytes and returns a copy with all Unicode letters that begin
// words mapped to their title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s []byte) []byte {
// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s string) string {
robpike commented 3 years ago

Not sure deprecation would do much. It's already got a big BUG annotation in the comment, and we can't get rid of it for compatibility reasons. But] pointing people to an actual bug-free replacement would be valuable.

smasher164 commented 3 years ago

@robpike The reason for deprecation is three-fold:

  1. // Deprecated comments are interpreted by tools like staticcheck (and eventually gopls) to discourage a function's use. pkg.go.dev even displays them differently: image
  2. It seems that issues related to Title's output continue to be opened, and we aren't able to fix them due to the backwards compatibility promise. If we can't provide support for its implementation, and in the present of a more robust alternative, deprecation is the clear solution.
  3. Deprecation notices are typically where alternatives are presented. If someone encounters strings.Title through an old example or tutorial, they now have a path forward.
rsc commented 3 years ago

It seems OK to turn the BUG note into something tooling will expose more directly. In retrospect it was a mistake to not have BUG comments be attached to specific functions/types/etc.

rsc commented 3 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

smasher164 commented 3 years ago

It seems OK to turn the BUG note into something tooling will expose more directly.

Is this for tools to be aware that a bug exists, or to provide some suggested action around it? If the suggested action is to replace the code with something else, I think deprecation notices already fulfill that purpose.

rsc commented 3 years ago

By "It seems OK to turn the BUG note into something tooling will expose more directly." I meant "it seems OK to replace the BUG note with a Deprecated: comment in this case."

rsc commented 3 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

andig commented 3 years ago

I‘m perfectly happy with using Title for cases where I know and control the source strings, e.g. for translating member variable visibility. Cases would be overkill for these scenarios. I also wouldn‘t want the linter complain about me doing that.

smasher164 commented 3 years ago

for cases where I know and control the source strings

So in this case, you

  1. Are aware of Title's limitations.
  2. Understand that Title's implementation falls under your use-case.
  3. Have control over the source strings.

In the event that all three of these are true, I don't see the issue with using //lint:ignore to prevent the linter from flagging it.

Even if it weren't marked deprecated, users should still be made aware that there's a more robust/correct alternative for case mapping. And in that case, the situation wouldn't be much better than a linter warning.

for translating member variable visibility

True, but Title is arguably also overkill here, since you only care about changing one rune.

func exported(s string) string {
    r, size := utf8.DecodeRuneInString(s)
    return string(unicode.ToTitle(r)) + s[size:]
}
andig commented 3 years ago

In the event that all three of these are true, I don't see the issue with using //lint:ignore to prevent the linter from flagging it.

My point is that there are perfectly valid uses of Title hence argue to not deprecate it. I'm unhappy with having to sprinkle my code with //lint:ignore for keeping the linter happy.

smasher164 commented 3 years ago

My point is that there are perfectly valid uses of Title hence argue to not deprecate it. I'm unhappy with having to sprinkle my code with //lint:ignore for keeping the linter happy.

I'm curious about the perfectly valid use cases of Title that cases is overkill for, but also require examining more than one rune.

Also, if your linter were to suddenly warn you on BUG comments, would that also make you unhappy? Would you not want to know if a function you were using had bugs? In the case of Title, you are already aware of its limitations, but that may not extend across other functions or even other users.

And if needed, you could scope a lint directive to a function or file, and declare your own

func Title(s string) string {
    //lint:ignore SA1019 I know what I'm doing
    return strings.Title(s)
}

Would this resolve the issue with adding a linter directive all over the place?

rsc commented 3 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

robpike commented 3 years ago

But what is the actual proposal being accepted? There is shift in thinking through the comments.

smasher164 commented 3 years ago

@robpike Correct me if I'm wrong, but I think the accepted proposal is still what is written in the first comment.

@andig's sentiment is to not have the linter flagging cases he knows to be correct. I presented the option to tell the linter to ignore these cases, without sprinkling the comment all over the codebase.

andig commented 3 years ago

I withdraw my objection. Searching around the Go codebase it seems this is the only occurrence of the BUG used in such a way (I was suspecting more).

For my personal preference, BUG is explicit enough to not need more tooling support. If we stick to the compatibility guarantee then I also wouldn't want my legacy code being flagged or my CI broken- toolchain is more than just the compiler. The same would be true though for deprecation notices so it's a weak argument.

That being said if you want to close the discussion due to having been accepted that's fine.

One takeaway though- and that resonates with the new developer survey- It would help to better educate people on the ecosystem and available libraries (however that could be achieved).

smasher164 commented 3 years ago

Reading through the comments again, I think the accepted proposal is actually to replace the BUG comment with Deprecated. I'm guessing that would mean:

Before:

// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.

After:

// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// Deprecated: The rule Title uses for word boundaries does not handle Unicode
// punctuation properly. Use golang.org/x/text/cases instead.

Edit: I sent in CL 359485. I think it can still use some wordsmithing.

gopherbot commented 3 years ago

Change https://golang.org/cl/359485 mentions this issue: strings, bytes: deprecate Title

gopherbot commented 3 years ago

Change https://golang.org/cl/361899 mentions this issue: doc/go1.18: strings,bytes: deprecate Title

frederikhors commented 2 years ago

Although I fully understand the reason for this deprecation, I am not convinced that the message is clear and that the alternative is enough and worth the many extra bytes.

I think we should specify in Go 1.18.1 that strings.Title() is incorrect when using Unicode characters, but that it remains great for ASCII.

And I would like it to be deprecated removed because tools like Jetbrains' Goland keep reporting problems where the problems aren't actually there (using ASCII characters).

Plus, cases.Title() does not work exactly like strings.Title() because all subsequent letters after the first one are replaced with the lower ones.

Example: myBottle becomes Mybottle with cases.Title() and MyBottle with strings.Title().

ianlancetaylor commented 2 years ago

It's true that strings.Title can be used with pure ASCII strings. But I don't think it's worth the documentation complexity to explain that. In retrospect, we would not have added strings.Title in the first place if it were only going to be used with ASCII strings. So I don't think we should let the fact that it exists lead us to try to document it for that case.

smasher164 commented 2 years ago

I think some of the users of strings.Title used to "export" ASCII identifiers, like I mention here: https://github.com/golang/go/issues/48367#issuecomment-948115920. While it's true that cases is an extra dependency, the alternatives of defining your own exported function or wrapping strings.Title in a function with //lint:ignore aren't too costly to bear.

Plus, cases.Title() does not work exactly like strings.Title() because all subsequent letters after the first one are replaced with the lower ones.

As mentioned in https://github.com/golang/go/issues/51956#issuecomment-1079549573, pass the cases.NoLower option to the Title function to achieve the behavior you want.

Edit: I can see the argument that we clarify in the deprecated message how specifically people can use the cases package as a replacement for Title.

frederikhors commented 2 years ago

Thanks for the answers. But I am still of the opinion (like @robpike after all, if I'm not wrong) that the deprecation must be removed and it should be specified that it is not good for Unicode.

ianlancetaylor commented 2 years ago

@frederikhors Let me put it this way. We made a decision here: add a deprecation notice to strings.Title. In general we should only revisit a decision if there is new information. Otherwise we just keep going back and forth. What is the new information?

I do think that we should add an example for Title showing how to call the cases code, if we can. (Unfortunately I"m not sure whether the example code will let us import a package outside of the standard library.)