golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.09k stars 17.68k forks source link

cmp: add Or #60204

Closed earthboundkid closed 1 year ago

earthboundkid commented 1 year ago

An extremely common string operation is testing if a string is blank and if so replacing it with a default value. I propose adding First(...strings) string to package strings (and probably an equivalent to bytes for parity, although it is less useful).

// First returns the first non-blank string from its arguments.
func First(ss ...string) string {
    for _, s := range ss {
        if s != "" {
            return s
        }
    }
    return ""
}

Here are three example simplifications from just archive/tar because it shows up first alphabetically when I searched the standard library:

archive/tar diff ```diff diff --git a/src/archive/tar/reader.go b/src/archive/tar/reader.go index cfa50446ed..bc3489227f 100644 --- a/src/archive/tar/reader.go +++ b/src/archive/tar/reader.go @@ -136,12 +136,8 @@ func (tr *Reader) next() (*Header, error) { if err := mergePAX(hdr, paxHdrs); err != nil { return nil, err } - if gnuLongName != "" { - hdr.Name = gnuLongName - } - if gnuLongLink != "" { - hdr.Linkname = gnuLongLink - } + hdr.Name = strings.First(gnuLongName, hdr.Name) + hdr.Linkname = strings.First(gnuLongLink, hdr.Linkname) if hdr.Typeflag == TypeRegA { if strings.HasSuffix(hdr.Name, "/") { hdr.Typeflag = TypeDir // Legacy archives use trailing slash for directories @@ -235,13 +231,8 @@ func (tr *Reader) readGNUSparsePAXHeaders(hdr *Header) (sparseDatas, error) { hdr.Format.mayOnlyBe(FormatPAX) // Update hdr from GNU sparse PAX headers. - if name := hdr.PAXRecords[paxGNUSparseName]; name != "" { - hdr.Name = name - } - size := hdr.PAXRecords[paxGNUSparseSize] - if size == "" { - size = hdr.PAXRecords[paxGNUSparseRealSize] - } + hdr.Name = strings.First(hdr.PAXRecords[paxGNUSparseName], hdr.Name) + size := strings.First(hdr.PAXRecords[paxGNUSparseSize], hdr.PAXRecords[paxGNUSparseRealSize]) if size != "" { n, err := strconv.ParseInt(size, 10, 64) if err != nil { diff --git a/src/archive/tar/writer.go b/src/archive/tar/writer.go index 1c95f0738a..e9c635a02e 100644 --- a/src/archive/tar/writer.go +++ b/src/archive/tar/writer.go @@ -188,10 +188,7 @@ func (tw *Writer) writePAXHeader(hdr *Header, paxHdrs map[string]string) error { var name string var flag byte if isGlobal { - name = realName - if name == "" { - name = "GlobalHead.0.0" - } + name = strings.First(realName, "GlobalHead.0.0") flag = TypeXGlobalHeader } else { dir, file := path.Split(realName) ```
seankhliao commented 1 year ago

Duplicate of #14423

earthboundkid commented 1 year ago

@seankhliao, good find, but that issue was frozen before the modern proposal process existed. Either that issue should be reopened and put through the proposal process or this one should be unclosed, but I don't think it's fair to call it a duplicate when the old one was never actually evaluated.

seankhliao commented 1 year ago

The idea was clearly evaluated in the previous issue and declined. The decisions we made before the proposal process are just as valid.

earthboundkid commented 1 year ago

I don't think it's fair to call 5 comments "clearly evaluated." The reception was mixed. Abbgrade was for it. Minux was against it. Bradfitz was neutral to positive on the idea if there was more data.

It ends with @griesemer saying,

This is not an issue, this is a feature request. Please discuss this first on one of the popular Go forums (mailing list, etc.).

I don't think he would have said "go discuss it somewhere else" if that discussion was precluded from having an effect because the issue was permanently closed once and for all. I think the idea was "go discuss it more and if it comes up again we can take another look." Now we have a formal process, so it's time to take a look. :-)

seankhliao commented 1 year ago

It was quite clear it doesn't belong in strings, and the natural place for it be in now, slices, also has the similar idea being declined in #52006

earthboundkid commented 1 year ago

It doesn't work in slices because it would need a T comparable, which is confusing, or to be a find func, which as you note was already declined. Just because it could be a generic doesn't mean it should be. :-) I've been using my personal stringutils.First for years and for me it's above the bar to get it into the standard library. Maybe I'm wrong, but I think it's worth having a discussion.

ianlancetaylor commented 1 year ago

I agree that the earlier issue didn't get a full proposal review. We can do it again.

That said, finding some more examples would help justify adding this.

And, in general the strings and bytes package are parallel. What would this look like in bytes, and would anybody use that variant?

earthboundkid commented 1 year ago

That said, finding some more examples would help justify adding this.

I've been using a version of First for at least three years that I can recall, and I'm up to 19 uses in a 13,000 line project. It's pretty routinely useful for me. (It might go back further, and I've just forgotten the history of it.)

Going back to the examples from archive/tar above, I think there's a readability gain in hdr.Name = strings.First(gnuLongName, hdr.Name) and name = strings.First(realName, "GlobalHead.0.0"), because you can tell quickly tell what the preferred value is and what's the fallback default, whereas in the old code the first example was set with if gnuLongName != "" and the second was set with if realName == "".

What would this look like in bytes, and would anybody use that variant?

I suppose it should be First(...[]byte) []byte, but I agree that it is unlikely to be used much, since the main use is to set a default for a string value.

earthboundkid commented 1 year ago

I’m doing some very basic searching on SourceGraph to find versions of this in the wild.

Okay, that’s as much looking at search results as I feel like doing now. If anyone can do a more semantic search over a larger corpus, I would be interested to see the results. One thing that surprised me was how often a repo would have multiple versions of it. Also the env var default thing comes up a lot.

Edit: Couldn't help myself, and I found another one in Istio 😆 Gotta force myself to close the tab before I go crazy.

ianlancetaylor commented 1 year ago

That's great data, thanks.

jimmyfrasche commented 1 year ago

This can be written for any comparable type using generics today

func First[T comparable](vs ...T) T {
  var zero T
  for _, v := range vs {
    if v != zero {
      return v
    }
  }
  return zero
}

The type constraint could be loosened to any if #26842 gets accepted.

While often for strings, I've written similar for all kinds of types, though I don't think I've ever needed anything other than 2 values at a time.

It's quite common in dealing with configuration where the zero value models an absence to be replaced by a default.

earthboundkid commented 1 year ago

I’ve had a toy repo with generic First for several years, but I’ve found that in practice I only ever use strings.

As for varadic vs a pair, most instances are just pairs, but I think the Go optimizer now optimizes the slice away, so you may as well have a variadic version for the occasional times when you need more than two.

AndrewHarrisSPU commented 1 year ago

It's quite common in dealing with configuration where the zero value models an absence to be replaced by a default.

Neither comparable nor any tightly constrain to types where var zero T is a robust sentinel value for inferring absence. I think this is a problem for a generic First, it's not foolproof enough.

rsc commented 1 year ago

This is clearly a useful operation, perhaps even useful enough to have in the standard library.

But is First the right name? Is it the name used anywhere else with this meaning?

53510 proposed slices.First(x) that returns x[0].

If I saw strings.First(x, y, z) I'd probably expect that it returned x (and wonder what the point was).

In text/template (and also in Lisp and Scheme, where I took it from), the name for this operation is or.

jimmyfrasche commented 1 year ago

It's also similar to the min/max builtins proposed in #59488 except that the item is selected in a less mathematical and more Go-specific way

cespare commented 1 year ago

Even if it's mostly used for strings, it really feels not string-related to me and not a good fit for the strings package.

However, I think I would use it quite a bit for not-strings if it existed. There's a certain kind of operation I write regularly in Go which would be written using a ternary in another language. Something like (this is grabbed from some real code):

port := h.GRPCPort
if port == 0 {
    port = 8500
}

With a ternary expression, you might write something like

port := h.GRPCPort == 0 ? 8500 : h.GRPCPort

With a slices-based function, you could do

port := slices.Or(h.GRPCPort, 8500)

I think or works well in Lisp and text/template but slices.Or seems a bit mysterious. But maybe it could work.

A longer, but more self-evident name is slices.FirstNonZero.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

earthboundkid commented 1 year ago

However, I think I would use it quite a bit for not-strings if it existed. There's a certain kind of operation I write regularly in Go which would be written using a ternary in another language. Something like (this is grabbed from some real code):

port := h.GRPCPort
if port == 0 {
  port = 8500
}

That's #37165, which also uses a default port as an example. :-)

earthboundkid commented 1 year ago

slices.FirstNonZero would presumably take an actual slice instead of a variadic argument, which is less ergonomic.

I'm fine with the name strings.FirstNonZero though.

earthboundkid commented 1 year ago

I don't think this is necessarily a great idea, but just to consider it, you could have package bools with Or[T comparable](...T) T and Cond[T any](cond bool, ifVal, elseVal T) T. I think that having Cond probably changes the feel of the language too much though because you'd be using a function call in a lot of places that use x = a; if cond { x = b} now.

AndrewHarrisSPU commented 1 year ago

Even if it's mostly used for strings, it really feels not string-related to me and not a good fit for the strings package.

I guess the follow-up is, what package would it fit in? Looking through https://github.com/golang/go/issues/60204#issuecomment-1550320945 I think it is pretty surprising that this is so frequently about environment variables (or at least configuration variables with similar usage patterns).

I wonder if there might be a more Glasgow (not New Jersey)-style package for aggregating configuration from program constants, env variables, json/yaml/toml etc., and I think this functionality would probably be natural in that style.

OTOH, applying the New Jersey philosophy, maybe it's not objectionable that people are re-implementing this functionality ad-hoc - it doesn't seem error-prone, and people can be as narrow or as abstract as they want.

jimmyfrasche commented 1 year ago

One place where comparable is insufficient for this use case is callbacks.

I've written a lot of code like this:

func New(cfg *Cfg) *Thing {
  t := &Thing{
    foo: cfg.Foo
  }
  if t.foo == nil {
    t.foo == fooDefault
  }
  return t
}

If there were an or that handled zero-comparable types (either a builtin or language change that allows it to be written with generics) that would just be:

func New(cfg *Cfg) *Thing {
  return &Thing{
    foo: or(cfg.Foo, fooDefault),
  }
}
leaxoy commented 1 year ago

A more general scenario. Use Nth-like instead First:

func NthOrZero[T any](elems []T, n int) T {
    if n <= 0 || n >= len(elems) {
        var empty T
        return empty
    }
    return elems[n]
}

and for ok check:

func Nth[T any](elems []T, n int) (T, bool) {
    if n <= 0 || n >= len(elems) {
        var empty T
        return empty, false
    }
    return elems[n], true
}

or if we has Maybe or Optional type:

func Nth[T any](elems []T, n int) Optional[T] {
    if n <= 0 || n >= len(elems) {
        return None()
    }
    return Some(elems[n])
}
mibk commented 1 year ago

But is First the right name? Is it the name used anywhere else with this meaning? […]

In text/template (and also in Lisp and Scheme, where I took it from), the name for this operation is or.

I'm reminded of the SQL function COALESCE.

Returns the first non-NULL value in the list, or NULL if there are no non-NULL values. At least one parameter must be passed.

mpx commented 1 year ago

I've found First is quite a common operation beyond strings. In particular, selecting the first error between operation(s) and cleanup (if any). I have used n-ary versions for different types (errors, strings), but 2 is most common. Similar issue for default fallback for ints and other types.

Writing this now, I'd use a generic version similar to @jimmyfrasche 's example above.

earthboundkid commented 1 year ago

I use errors.Join for that purpose. It’s a little different because it returns a multierror when necessary, but in the basic case you can treat it like “first error or nil”.

mpx commented 1 year ago

I use errors.Join for that purpose.

Typically the situations I encounter result in any errors beyond the First being irrelevant or a distraction - hence I don't use Join. Depending on the circumstance, either could be appropriate.

golightlyb commented 1 year ago

Throwing a vote in for @ianlancetaylor's Default. Possibly a variadic version if you want but preferably not.

Especially because of having errors.Join, or ignoring secondary errors as previous reply, I think Default with two arguments is a fine name and signature. Thinking of 90% of use cases.

Occasionally I do want the first non-nil, or first non-zero, value of a variadic list of inputs, but far less often, and if I did I'd want it called FirstNonZero or FirstNonNil, even if it did the exact same thing as Default, as it better expresses intent, especially if the former were variadic and the latter was only Default(a, b).

Galaxy brain: if I really want a variadic version I'd call some generic Reduce function and pass Default as the reducer function.

cespare commented 1 year ago

I'm liking slices.Coalesce[T any](...T) T.

slices.FirstNonZero would presumably take an actual slice instead of a variadic argument, which is less ergonomic.

I think you really want this function to take a variadic argument, though I guess that would make it slightly unusual among the other members of slices.

jimmyfrasche commented 1 year ago

@cespare without #26842 it would have to be slices.Coalesce[T comparable](...T) T so you couldn't use it for functions

earthboundkid commented 1 year ago

On the strings vs. generics question, we have evidence from code search that people are writing and using the strings version of this. The generic version might also be worth having around, but we don't really have evidence for that yet. I also think people aren't necessarily going to think to look in slices for a "first non zero" function or whatever it's called. To me the preferable thing would be to just add strings.Coalesce for now and circle back to the broader question later, maybe after #26842.

jarrodhroberson commented 1 year ago

In Java this was called FirstNonNull() and that is semantically useful name. First() is a terrible name because the semantic is incorrect. To be semantically informative and fullfil the least surprise, it would be called FirstNonEmpty() if it was for string only. FirstNonZeroValue() for comparable but that gets suspect of its usefulness and FirstNonNil() for pointer types. Like others have said, a generic version is fraught with subtle issues as well.

This is trivial code that does not need to be cluttering up the standard library. There are lots of other more useful things to spend time on like proper Enums or even a proper set slice implementation that enforced uniqueness.

earthboundkid commented 1 year ago

This is trivial code that does not need to be cluttering up the standard library.

I think that people use the strings version of this often enough that it meets the bar for just having in the standard library instead of having three or however many copies in Istio.

The generic version is a harder sell because it doesn’t really fit into slices and it definitely doesn’t deserve its own package.

jimmyfrasche commented 1 year ago

it would certainly be nice to do something here.

strings.Coalesce would be handy in some situations but you'd still need it for other types (though probably not the implied bytes.Coalesce).

An operator, like ?? in the related #37165, would cover all the cases but adding an operator is a large change.

slices.Coalesce written today would be limited to comparable types and func is one of the times where this comes up. With #26842, it could be written in a fully general manner but it would indeed be an odd one out in slices.

A coalesce builtin, defined like the new min/max, wouldn't need to worry about fitting in any package, would have a lower bar to clear than ?? (though still a high bar), and could be fully generic without having to wait on other language changes.

ianlancetaylor commented 1 year ago

Just a note that if we add an operator, it seems to me that the operator should be ||. Which we already have. Not that I'm arguing in favor of an operator, I just think that if we go down the operator path there's no reason to introduce a new one. The meaning here is just a slight extension to what || already does.

rsc commented 1 year ago

Let's see how it feels to call it cmp.Or. Right now we have code like this in the go command:

GO386    = envOr("GO386", buildcfg.GO386)
GOAMD64  = envOr("GOAMD64", fmt.Sprintf("%s%d", "v", buildcfg.GOAMD64))
GOMIPS   = envOr("GOMIPS", buildcfg.GOMIPS)

These would become:

GO386    = cmp.Or(Getenv("GO386"), buildcfg.GO386)
GOAMD64  = cmp.Or(Getenv("GOAMD64"), fmt.Sprintf("%s%d", "v", buildcfg.GOAMD64))
GOMIPS   = cmp.Or(Getenv("GOMIPS"), buildcfg.GOMIPS)

The signature would be

// Or returns the first non-zero element of list, or else returns the zero T.
func Or[T comparable](list ...T) T

It's worth noting that the operation is called "or" in Lisp, Python, and many other languages, and conceptually it is returning one or the other of these values. cmp.Or() won't type-check but cmp.Or[Foo]() returns a zero Foo.

Thoughts?

jimmyfrasche commented 1 year ago

The limitation to comparable is too unfortunate. I'd be fine with it in cmp if it worked over non-comparable types as well.

jimmyfrasche commented 1 year ago

The Getenv case seems like it's more common than others. Maybe there should be an os.GetenvOr even in the face of any other changes?

ianlancetaylor commented 1 year ago

@jimmyfrasche With what the language supports today I don't see a way to write a generic version that supports slices, maps, functions, or channels. Do you have any thoughts on how that could work? We could add maps.Or and slices.Or if that seems useful.

earthboundkid commented 1 year ago

cmp.Or works for me.

I have had a version of it that uses reflection in my toolbox for a while, so it can skip zero length slices and maps, but it's much slower than a normal comparison and I can never bring myself to it.

earthboundkid commented 1 year ago

Proposal renamed to be about cmp.Or.

jimmyfrasche commented 1 year ago

The major use for non-comparable types is funcs which can't really be handled generically.

If there's a runtime "is this all bits zero" predicate you can hook into to work around the lack of a general way to test zero-ness that would be fine by me since this function is 90% of the reason I'd want such a thing. If that's the route, exposing it as a cmp.Zero[T any](v T) bool would get the other 10%.

zephyrtronium commented 1 year ago

Zero-coalescing for functions seems to be widely applicable in net/http* packages and crypto/tls, which have a variety of configurable functions with default behaviors. That said, my own primary use would be for maps, especially maps of maps, where it would simplify creating the map for the first insert. I can think of three times I've done this in the last month, though unfortunately not in public code.

37165/https://github.com/golang/go/issues/60204#issuecomment-1567578853 solves the same problem more compactly and without the comparable limitation, and short-circuiting is uncommonly but still occasionally useful. Given a zero-coalescing operator, I can't think of any reason to use cmp.Or. Is choosing the latter a reason to reject the former?

gazerro commented 1 year ago

Here's some actual code I came across today

if event.Location.City != "" {
        p["city"] = event.Location.City
} else {
        p["city"] = nil
}

where p has type map[string]any.

By using the || operator, it could have been written as

p["city"] = event.Location.City || nil
jimmyfrasche commented 1 year ago

@gazerro that would not work with any of the proposals as the types do not match.

gazerro commented 1 year ago

@jimmyfrasche the || operator has been proposed but its semantic have not been explicitly defined. It has certainly been implied that the operands should have the same type, and the expression has the type of the operands.

However, to allow for the case where an expression with the || operator is assigned to a value of type any and passed as an argument to a parameter of type any, as in:

var x any = a || b

only in this case, it could be specified that the types of a and b may not be the same, but they must be assignable to the type of x.

jimmyfrasche commented 1 year ago

@gazerro that would be very different from how other binary operators—including today's ||—work. https://go.dev/play/p/zXDM4a0KYsl

gazerro commented 1 year ago

@jimmyfrasche absolutely, but there are many special cases in the spec. I think it's just a matter of considering whether it's worth it for this particular use case

seh commented 1 year ago

At that point, aren't you really asking for a ternary conditional operator? This is just a restricted form where you want to retain part of the predicate in the consequent case.

jimmyfrasche commented 1 year ago

@ianlancetaylor There are at least two ways to write a generic is-zero predicate in the language currently.

The simple way is reflect.ValueOf(v).IsZero()

The less simple way is

func Zero[T any](v T) bool {
    bp := (*byte)(unsafe.Pointer(&v))
    sz := unsafe.Sizeof(v)
    for _, v := range unsafe.Slice(bp, sz) {
        if v != 0 {
            return false
        }
    }
    return true
}

I did not benchmark but I'm sure that's faster than reflect and could be made faster still by special casing common sizes, preferring to check a word at a time, etc. And a magic runtime function with custom assembly per arch and treating it as a compiler intrinsic would go even further, surely.

Exporting Zero in cmp or elsewhere is perhaps a discussion for another thread but I think it's feasible to use it to implement the more general cmp.Or[T any].