golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

cmp: add Or #60204

Closed earthboundkid closed 1 year ago

earthboundkid commented 1 year ago

An extremely common string operation is testing if a string is blank and if so replacing it with a default value. I propose adding First(...strings) string to package strings (and probably an equivalent to bytes for parity, although it is less useful).

// First returns the first non-blank string from its arguments.
func First(ss ...string) string {
    for _, s := range ss {
        if s != "" {
            return s
        }
    }
    return ""
}

Here are three example simplifications from just archive/tar because it shows up first alphabetically when I searched the standard library:

archive/tar diff ```diff diff --git a/src/archive/tar/reader.go b/src/archive/tar/reader.go index cfa50446ed..bc3489227f 100644 --- a/src/archive/tar/reader.go +++ b/src/archive/tar/reader.go @@ -136,12 +136,8 @@ func (tr *Reader) next() (*Header, error) { if err := mergePAX(hdr, paxHdrs); err != nil { return nil, err } - if gnuLongName != "" { - hdr.Name = gnuLongName - } - if gnuLongLink != "" { - hdr.Linkname = gnuLongLink - } + hdr.Name = strings.First(gnuLongName, hdr.Name) + hdr.Linkname = strings.First(gnuLongLink, hdr.Linkname) if hdr.Typeflag == TypeRegA { if strings.HasSuffix(hdr.Name, "/") { hdr.Typeflag = TypeDir // Legacy archives use trailing slash for directories @@ -235,13 +231,8 @@ func (tr *Reader) readGNUSparsePAXHeaders(hdr *Header) (sparseDatas, error) { hdr.Format.mayOnlyBe(FormatPAX) // Update hdr from GNU sparse PAX headers. - if name := hdr.PAXRecords[paxGNUSparseName]; name != "" { - hdr.Name = name - } - size := hdr.PAXRecords[paxGNUSparseSize] - if size == "" { - size = hdr.PAXRecords[paxGNUSparseRealSize] - } + hdr.Name = strings.First(hdr.PAXRecords[paxGNUSparseName], hdr.Name) + size := strings.First(hdr.PAXRecords[paxGNUSparseSize], hdr.PAXRecords[paxGNUSparseRealSize]) if size != "" { n, err := strconv.ParseInt(size, 10, 64) if err != nil { diff --git a/src/archive/tar/writer.go b/src/archive/tar/writer.go index 1c95f0738a..e9c635a02e 100644 --- a/src/archive/tar/writer.go +++ b/src/archive/tar/writer.go @@ -188,10 +188,7 @@ func (tw *Writer) writePAXHeader(hdr *Header, paxHdrs map[string]string) error { var name string var flag byte if isGlobal { - name = realName - if name == "" { - name = "GlobalHead.0.0" - } + name = strings.First(realName, "GlobalHead.0.0") flag = TypeXGlobalHeader } else { dir, file := path.Split(realName) ```
ianlancetaylor commented 1 year ago

Just a note that an IsZero predicate is trivial for a comparable type. Of course that is not enough for all types that can be compared to nil.

func IsZero[T comparable](v T) bool {
    var zero T
    return v == zero
}
jimmyfrasche commented 1 year ago

With #35966 and #26842 the general version would just be

func Zero[T any](v T) bool {
  return v == zero
}

(or however zero is spelled in the end).

jimmyfrasche commented 1 year ago

Maybe later isZero can be written in a simpler form. It can be written now and could be kept unexported. It would allow writing the more general cmp.Or[T any].

Say instead we go with Or[T comparable]. If afterwards there's a language change that allows the Or[T any] version, it's stuck because of compat and everyone will have to write their own version if they want to use it with incomparable types.

Writing the Or[T any] version now at worst gets the more complicated bits replaced with a single == down the road.

The other option is to put cmp.Or on hold until #35966 and #26842 or something along those lines get worked out. That wouldn't be the end of the world but it would be shame, as this would make a nice quality of life improvement.

earthboundkid commented 1 year ago

Say instead we go with Or[T comparable]. If afterwards there's a language change that allows the Or[T any] version, it's stuck because of compat and everyone will have to write their own version if they want to use it with incomparable types.

Is that true? ISTM, that because you can't say var myOr = cmp.Or, there's no API compatibility reason that prevents you from loosening a generic type constraint. I think it's safe to do [T comparable] now and loosen to [T any] if #26842 ever gets approved.

earthboundkid commented 1 year ago

At that point, aren't you really asking for a ternary conditional operator? This is just a restricted form where you want to retain part of the predicate in the consequent case.

The ternary is famously confusing. I think having a restricted form makes it significantly less confusing without sacrificing too much of its convenience.

seh commented 1 year ago

The ternary is famously confusing.

Yes, I've followed the discussions here over the years, so I'm familiar with the detractors' arguments. I was reacting to @gazerro's example in https://github.com/golang/go/issues/60204#issuecomment-1582994526.

rsc commented 1 year ago

Re slices and funcs, if at some point in the future we find a way for generics to allow cmp.Or's constraint to expand to allow them, that would be a non-breaking change. So we don't have to hold up the current cmp.Or to figure that out. If at some point we figure out universal zero value and change the "any" constraint to allow comparison against zero, then cmp.Or could be:

func Or[T any](list ...T) T {
    for _, x := range list {
        if x != zero {
            return x
        }
    }
    return zero
}

But for today it can be

func Or[T comparable](list ...T) T {
    var zero T
    for _, x := range list {
        if x != zero {
            return x
        }
    }
    return zero
}

Again, if at some point in the future we can do the top version, that's a non-breaking change.

jimmyfrasche commented 1 year ago

If it's a nonbreaking change, then I'm fine with the current limitation.

earthboundkid commented 1 year ago

Going back to the initial example of archive/tar, it's not clear to me that cmp.Or is a readability improvement the ways that strings.First is, but that said, I'm happy either way:

diff --git a/src/archive/tar/reader.go b/src/archive/tar/reader.go
index cfa50446ed..bc3489227f 100644
--- a/src/archive/tar/reader.go
+++ b/src/archive/tar/reader.go
@@ -136,12 +136,8 @@ func (tr *Reader) next() (*Header, error) {
            if err := mergePAX(hdr, paxHdrs); err != nil {
                return nil, err
            }
-           if gnuLongName != "" {
-               hdr.Name = gnuLongName
-           }
-           if gnuLongLink != "" {
-               hdr.Linkname = gnuLongLink
-           }
+           hdr.Name = cmp.Or(gnuLongName, hdr.Name)
+           hdr.Linkname = cmp.Or(gnuLongLink, hdr.Linkname)
            if hdr.Typeflag == TypeRegA {
                if strings.HasSuffix(hdr.Name, "/") {
                    hdr.Typeflag = TypeDir // Legacy archives use trailing slash for directories
@@ -235,13 +231,8 @@ func (tr *Reader) readGNUSparsePAXHeaders(hdr *Header) (sparseDatas, error) {
    hdr.Format.mayOnlyBe(FormatPAX)

    // Update hdr from GNU sparse PAX headers.
-   if name := hdr.PAXRecords[paxGNUSparseName]; name != "" {
-       hdr.Name = name
-   }
-   size := hdr.PAXRecords[paxGNUSparseSize]
-   if size == "" {
-       size = hdr.PAXRecords[paxGNUSparseRealSize]
-   }
+   hdr.Name = cmp.Or(hdr.PAXRecords[paxGNUSparseName], hdr.Name)
+   size := cmp.Or(hdr.PAXRecords[paxGNUSparseSize], hdr.PAXRecords[paxGNUSparseRealSize])
    if size != "" {
        n, err := strconv.ParseInt(size, 10, 64)
        if err != nil {
diff --git a/src/archive/tar/writer.go b/src/archive/tar/writer.go
index 1c95f0738a..e9c635a02e 100644
--- a/src/archive/tar/writer.go
+++ b/src/archive/tar/writer.go
@@ -188,10 +188,7 @@ func (tw *Writer) writePAXHeader(hdr *Header, paxHdrs map[string]string) error {
        var name string
        var flag byte
        if isGlobal {
-           name = realName
-           if name == "" {
-               name = "GlobalHead.0.0"
-           }
+           name = cmp.Or(realName, "GlobalHead.0.0")
            flag = TypeXGlobalHeader
        } else {
            dir, file := path.Split(realName)
benhoyt commented 1 year ago

I kind of like the idea, but I find the name Or unclear. The second example Russ gave with cmp.Or seems less readable than the first:

// Original
GO386 = envOr("GO386", buildcfg.GO386)

// This seems less readable to me: "Or" doesn't convey the intent very well
GO386 = cmp.Or(os.Getenv("GO386"), buildcfg.GO386)

Maybe that would grow on me, and people would get familiar with it, over time. But to me it seems that cmp.Or(a, b) just doesn't convey the intent very well. Ian's suggestion of Default on issue 37165 seems better.

// Reads nicely, but requires a significant change to ||
GO386 = os.Getenv("GO386") || buildcfg.GO386

// Reads better than "Or"
GO386 = cmp.Default(os.Getenv("GO386"), buildcfg.GO386)

// Reads most clearly, but obviously isn't general
GO386 = EnvOrDefault("GO386", buildcfg.GO386)
rsc commented 1 year ago

Re "or" being unclear, a name is a name - we will get used to it. The same could have been said of strings.Cut. It is also worth noting that Python and Lisp and Scheme all use "or" and people understand it there.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

earthboundkid commented 1 year ago

It's a little bit extreme, but I think if there were strings.Or in addition to cmp.Or, it would solve the readability problem. Obviously, it's redundant though, so probably not worth it.

benhoyt commented 1 year ago

Re "or" being unclear, a name is a name - we will get used to it. The same could have been said of strings.Cut. It is also worth noting that Python and Lisp and Scheme all use "or" and people understand it there.

I guess my concern with the name is actually less about the word "or" and more that a prefix function call with package name like cmp.Or(Getenv("X"), "y") isn't nearly as readable as Python's Getenv("X") or "y" or say Getenv("X") || "y". The "Or" part of the name kind of gets lost because of the cmp package prefix and the fact that it's a "prefix operation" rather than infix like Python's or.

That said, you're probably right that we'll get used to it.

tdakkota commented 1 year ago

I am worried about potential bugs caused by using Or with new type inference.

The following snippet does not compile on Go 1.20, but does on Go tip.

    var outputFile *os.File
    {
                // Implictly instanitated with *os.File.
        output := Or(outputFile, os.Stdout)
        fmt.Println(output == outputFile, output == os.Stdout) // on tip: false true
    }
    {
                // Implictly instanitated with io.Writer.
        output := Or(outputFile, io.Discard)
        fmt.Println(output == outputFile, output == io.Discard) // on tip: true false
    }

https://go.dev/play/p/YWJoWZuS_iN?v=gotip

DeedleFake commented 1 year ago

Ooh, an even more subtle way to run into the nil interface problem. Great.

Maybe Or() should check for true nility? Is that even possible in a feasible way with how generics currently work? I can't really think of a way to do it off the top of my head that doesn't involve reflect.

gopherbot commented 1 year ago

Change https://go.dev/cl/504883 mentions this issue: cmp: add Or

AndrewHarrisSPU commented 1 year ago

@tdakkota, that's very interesting.

A constraint based on underlying types ~string, ~int, etc. for cmp.Or also seems like a tractable implementation.

ianlancetaylor commented 1 year ago

@tdakkota Thanks. I filed #60933.

hherman1 commented 1 year ago

Reading through the comments in #60933, I’m concerned that despite finding a way to prevent implicitly converting to a non zero interface we still won’t have a way to make something like cmp.Or(os.Stdout, io.Discard) work correctly. Even with explicit types, it seems busted.

It makes me wonder if we really want cmp.Or to be generic or if it might be safer to go with something concretely typed.

earthboundkid commented 1 year ago

os.Stdout is always going to be non-nil, so it’s hard to judge based on that as an example. For code like this, it should be okay:

type appEnv struct {
  out io.Writer
}

…

if verboseFlag {
  app.out = os.Stdout
}

…

_, err := io.Copy(cmp.Or(app.out, io.Discard), logSrc)
hherman1 commented 1 year ago

Ok, but I don’t think that addresses the issue. If you use a field with an actually nillable *os.File, cmp.Or fails again.

ianlancetaylor commented 1 year ago

@hherman1 There is a long-standing tripping point in Go regarding storing a typed nil in an interface value. There is a FAQ about it: https://go.dev/doc/faq#nil_error. That does mean that instantiating cmp.Or with an interface type can have subtle effects. However, I don't think we should let that fact about the language stand in the way of adopting cmp.Or.

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

mibk commented 1 year ago

Recently, I encountered a piece of code where I thought a logical XOR operation could be helpful.

if found && wanted || !found && !wanted {
    // Nothing to do.
    return
}

With the introduction of the package cmp, I imagined that there might be an Xor function:

if cmp.Xor(found, !wanted) {
    // Nothing to do.
    return
}

However, I recalled that cmp.Or will soon be added and realized that the purpose of package cmp doesn't align with this function:

Package cmp provides types and functions related to comparing ordered values.

Since cmp.Xor (which I'm not proposing) would seem to me like a perfect fit for a package named cmp, I began to question whether "cmp" is then a good name for a package "related to comparing ordered values", and/or whether Or is a good name for a function "returning the first non-zero value of its arguments".

I just wanted to bring it up for discussion.

earthboundkid commented 1 year ago

I actually had truthy.Xor in an old version of my truthiness testing package, but I removed it because I wasn’t actually using it.

jub0bs commented 1 year ago

I salute Carl's effort to collect usage data. However, I remain unconvinced that such a functionality warrants an addition to the standard library. Projects that need such an Or/First function can easily (and already do) re-implement it when the need arises.

The semantics are different, but I would keep this function out of the standard library for the same reason I wouldn't want to see the addition of a First (or Last) function to the slices package.

A little copying is better than a little dependency.

meyermarcel commented 1 year ago

os.Stdout is always going to be non-nil, so it’s hard to judge based on that as an example. For code like this, it should be okay:

type appEnv struct {
  out io.Writer
}

…

if verboseFlag { // here is logic (1)
  app.out = os.Stdout
}

…
//                here is also logic inlined (2)
_, err := io.Copy(cmp.Or(app.out, io.Discard), logSrc)

Sorry for commenting very late this issue. I did not react to this proposal because emoji-meter (👍👎) in the description suggested no acceptance.

I understand the use cases found in the collected data. Thanks for finding and summarize them 🙏🏼

My concern is readability and I do not know if cmp.Or changes how are if statements are written.

Can this example also be written like this?

type appEnv struct {
  out io.Writer
}

…

if verboseFlag {
  app.out = os.Stdout // logic is only here (1)
} else {
  app.out = os.Discard // and here (2) and written only with existing if and else
}
…

_, err := io.Copy(app.out, logSrc)

If the previous example can also be written this way, then there is a good example of why this example could fragment the writing of logic and ultimately make it difficult to read because of inconsistency. There are now multiple ways to write this code.

However, if this is very rarely the case, then please ignore my comment.

In the future, should you read cmp.Or in many places where you expect an if or else, that would be a backdoor ternary operator and lead to inconsistent code.

The implications for consistent writeability are unclear to me, and I hope this is considered in this proposal.

earthboundkid commented 1 year ago

The app.out example seems to me more like it should be written with hypothetical func Cond(bool, ifVal, elseVal T) T:

type appEnv struct {
  out io.Writer
}

…

app.out = Cond(verboseFlag, os.Stdout, io.Discard)

…

_, err := io.Copy(app.out, logSrc)

Cond would be basically the ?: ternary but as a function and without short circuiting. I agree that having Cond would lead to changes in the Go ecosystem that would affect the experience of reading it. I don't think it's a good fit for Go as it exists. The most clear straightforward to write the code ought to be something like:

type appEnv struct {
  out io.Writer
}

…

app.out = io.Discard
if verboseFlag {
    app.out = os.Stdout
}

…

_, err := io.Copy(app.out, logSrc)

Yes, you could use cmp.Or here, but that seems like more work and less clarity than just using an assignment, so I don't think it will come up too much in practice. We'll see!

meyermarcel commented 1 year ago

Yes, you could use cmp.Or here, but that seems like more work and less clarity than just using an assignment, so I don't think it will come up too much in practice. We'll see!

New code constructions will be introduced slowly and their use will be determined by the developer who does not have so much clarity. If cmp.Or is used incorrectly, we will notice it late and there will be no or only a rocky way back via vet tools. We might be left with worse readability and more inconsistent code. "We'll see!" is too risky for me in the context of long-term changed readability.

But on the other hand I have no concrete examples and if it remains only with the use as with the code examples found thanks to carlmjohnson then cmp.Or is an improvement.