golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.17k stars 17.56k forks source link

spec: define that structs are compared field-by-field as listed in source code #8606

Closed mdempsky closed 1 year ago

mdempsky commented 10 years ago
gc handles struct/array comparisons by short-circuiting if it finds any unequal
fields/elements, and this behavior is noticeable because the Go spec requires
comparisons to panic in some cases; e.g., see http://play.golang.org/p/5jqSUAT1xC

However, unlike short-circuiting for evaluating "a && b", it doesn't
seem that short-circuiting of field/element comparisons is specified by the spec. 
Arguably, the spec currently requires that instead both comparisons in the above program
should panic.

Not a major issue, but thought I'd file an issue to note it.  A couple possible ways to
address it:

1. Ignore it since it probably doesn't matter in practice.
2. Specify gc's behavior since it's intuitive and easy to explain.
3. Specify a set of allowable behaviors (e.g., allow short-circuiting or not; and/or
allow any particular ordering for element/field comparisons).
4. Change gc to not (visibly) short-circuit comparisons that involve comparing interface
types; e.g., comparing two [512]int arrays can still short-circuit, but comparing two
struct{a int; b, c interface{}; d int} structs would need to always compare the b and c
fields, and a and d could be compared conditionally.
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-none.

griesemer commented 10 years ago

Comment 2:

Labels changed: added documentation.

Owner changed to @griesemer.

Status changed to Accepted.

bradfitz commented 4 years ago

@griesemer, @ianlancetaylor, I closed #38676 as a duplicate of this one but it'd be nice to get a decision here for https://go-review.googlesource.com/c/go/+/230207 ("cmd/compile: improve generated eq algs for structs containing interfaces").

/cc @josharian @mdempsky @randall77

mdempsky commented 4 years ago

I'm inclined to say the spec should explicitly say the order of field/element comparison is unspecified, and a struct/array with both unequal and incomparable fields/elements may either compare false or panic.

I'd say go ahead with CL 230207.

ianlancetaylor commented 4 years ago

I agree with @mdempsky.

If we ever decide to implement more struct order of evaluation, this would be another place where we ought to specify the order.

josharian commented 4 years ago

If we go that route (which I support), and we wanted to prevent people from accidentally relying on short-circuiting behavior, we could generate always-panic implementations in -race mode (or under a new flag).

griesemer commented 4 years ago

I'd also agree with @mdempsky.

More generally, a principle I like is to assume that anything that is not explicitly defined by the language spec is in is in fact unspecified even if we don't say so explicitly, and thus should no be relied upon.

zigo101 commented 4 years ago

Isn't one of the goals of Go is to reduce unspecified cases as few as possible?

griesemer commented 4 years ago

We want to reduce dependence on unspecified behavior as much as possible. That is not quite the same as saying something is unspecified in the spec.

(For instance, map iteration order is not specified, and by making it random, people cannot rely upon some implementation-dependent iteration order.)

ianlancetaylor commented 4 years ago

I would not say that it is a goal of Go to reduce unspecified cases as much as possible. I don't think we're opposed to reducing unspecified cases, but it's not a goal.

If reducing unspecified cases were a goal we would have exact rules of order of evaluation, exact rules for map iteration, etc.

josharian commented 4 years ago

If we go that route (which I support), and we wanted to prevent people from accidentally relying on short-circuiting behavior, we could generate always-panic implementations in -race mode (or under a new flag).

I started implementing this and it got complicated and ugly, so I stopped. If folks actively want it, I can try again. Otherwise, I'm going to throw in the hat.

zigo101 commented 4 years ago

@griesemer @ianlancetaylor I understand there are some good reasons of why map iteration orders and expression evaluation orders in some statements are unspecified. But are there any obstacles and disadvantages to keep the current implementation (by field and element orders)?

mdempsky commented 4 years ago

But are there any obstacles and disadvantages to keep the current implementation (by field and element orders)?

The immediate one is it would prevent optimization CLs like 230207, which users seem more likely to benefit from than to be harmed by it.

More generally, we reserve the right to make changes in behavior as long as they still conform to the language spec.

zigo101 commented 4 years ago

OK, get it. But I think, from general user point view, predicable behaviors would be better.

ianlancetaylor commented 4 years ago

Some users want predictable behavior, some users want faster code. We have to make our best guess as to which is most important for each specific case.

rsc commented 4 years ago

The spec says that structs are equal if all the fields are equal. It also says "A comparison of two interface values with identical dynamic types causes a run-time panic if values of that type are not comparable." This doesn't sound ambiguous to me: it sounds like the panic is required.

We could relax the rules and say the panic may or may not happen depending on other struct field values, but if we did that, different implementations would behave differently. Moving code from one system to another might mean a panic appears where it never did before, making people think the new system has a bug.

The spec doesn't say what order fields are compared, but it implies that they are all compared. If there are two different interface fields that cause panics in a particular comparison, it doesn't matter much which one happens, but one has to happen (not zero).

A compiler that wants to short-circuit field comparisons just has to do all the interfaces before it starts being clever.

Moving into proposal process to make sure discussion terminates.

josharian commented 4 years ago

Changing implementations to match the spec could cause code that has been working correctly for years to start panicking. Though there indeed downsides to laxness in the spec, I think in this case changing the spec to be more flexible—to allow the existing implementations and further optimizations—is preferable.

mdempsky commented 4 years ago

Just as another data point, it looks like gccgo also implements short-circuiting behavior like cmd/compile.

I'm open to making the user-visible behavior be "all struct/array elements are compared", but I think we should first demonstrate we can actually implement that without negative impact on user programs (both performance and breaking existing programs) before committing to that.

I'm reasonably confident it's doable too, but I think implementing invisible short-circuiting might actually be somewhat subtle.

For example, here:

var x1, x2 [10]struct{ i interface{}; x [100]int }
x1[0].i = true
println(x1 == x2)

we should only need to compare the 10 interface values, but naively implemented I think we'd end up comparing 900 ints as well.

josharian commented 4 years ago

Having just spent a week mucking around in the alg generation code, I can confirm @mdempsky’s suspicion. I would like to make it less difficult—it’d allow us to generate better code in many circumstances—but it’ll involve a large, intricate refactoring.

mdempsky commented 4 years ago

I think this could work:

  1. Change the signature of the eq alg functions from func(*T, *T) bool to func(*T, *T, bool) bool.

  2. Change all callers from eq(p, q) to eq(p, q, true).

  3. Within cmd/compile, we split ASPECIAL into two cases: ASPECIAL_GENERAL (for comparisons containing interface-typed sub-elements, which cannot be short-circuited) and ASPECIAL_COMMON (without interface-typed sub-elements, which can be safely short-circuited).

The eq algorithm pattern would be:

func T_eq(p, q *T, x bool) bool {
    // Recursively and unconditionally call any `ASPECIAL_GENERAL` functions.
    x = T1_eq(&p.general1, &q.general1, x)
    x = T2_eq(&p.general2, &q.general2, x)

    // Afterwards, fallback to today's comparison logic,
    // except that we first short-circuit on "x", and we explicitly pass
    // "true" to any `ASPECIAL_COMMON` functions.
    return x && p.int1 == q.int1 && T3_eq(&p.common1, &q.common1, true)
}
josharian commented 4 years ago

@mdempsky you also need to handle the interaction with inlined comparisons—and currently that interaction is distant. In alg generation we just insert and OEQ and much later during walk we decide whether or not to turn that into a function call.

randall77 commented 4 years ago

Maps, of course, use equality under the hood:

The comparison operators == and != must be fully defined for operands of the key type; thus the key type must not be a function, map, or slice. If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic.

This statement is very vague about when it will cause a runtime panic. Currently, we will panic on the first hash of such a key. Currently also, that will be on the first insert, lookup, or delete of that key. But I could imagine other implementations that do other things (e.g. going from 0 to 1 element, we really don't need to hash).

Fortunately, we always panic on the hash of a key that would panic on a comparison. So everything that makes it into the map is good, so any subsequent == calls will never panic.

rsc commented 4 years ago

To confirm, it sounds like both cmd/compile and gccgo both currently walk the fields in source code order doing comparisons and stop early. I don't think I realized that when I commented two weeks ago. I am still concerned about making sure that all implementations agree and not giving any latitude in the spec that could produce portability problems between Go toolchains. It sounds like the conservative thing to do - and possibly the right thing to do - is to say that what cmd/compile and gccgo do today is the standard, and write that into the spec.

Compilers could still optimize as long as you cannot observe the change, meaning a non-interface compare can't be reordered across an interface compare.

What do people think of that resolution?

randall77 commented 4 years ago

That sounds ok to me.

I'm not sure we actually need to add it to the spec, though. It's really an esoteric corner case. Is there any place we record decisions about making consistent implementations, that isn't in the spec? Things like

josharian commented 4 years ago

@rsc to be clear, does that resolution entail rolling back CLs 230207 and 230208?

That resolution seems better than changing the implementations to match the spec as written, but I think I am still on the side of more implementation flexibility. I believe bugs in this arena will be rare, the benefits tangible, and the spec simpler.

ianlancetaylor commented 4 years ago

I think we would have to make 230207 and 230208 a bit more complex. Instead of p.type == q.type we would have to generate something like p.type == q.type && (isComparable(p.type) ? true : panic("oh noes")).

I think the question is whether we want to grant more implementation flexibility at the cost of having a program that works under one implementation panic under a different one. Is this a simple difference in order of evaluation, which we normally ignore, or is the difference more significant due to the panic?

rsc commented 4 years ago

I believe bugs in this arena will be rare, the benefits tangible, and the spec simpler.

You are taking the compiler-writer point of view. I am taking the user point of view. I realize those are in tension, but we've been burned almost every time we allow wiggle room in implementations as soon as a new implementation comes along.

I see that @josharian thumbs-up-ed @ianlancetaylor's previous comment.

My comment a week ago was proposing to define that fields are compared in order and that any unequal comparison stops the overall comparison. Does anyone object to this, or do we have consensus?

josharian commented 4 years ago

You are taking the compiler-writer point of view. I am taking the user point of view.

Sorry, Russ, but I find this unfair and a bit patronizing.

I see that @josharian thumbs-up-ed @ianlancetaylor's previous comment.

I thumbed-up Ian's comment because I thought the technical suggestion in it was sound, and the reformulation of the question sensible. Ian's comment did not suggest a particular resolution to the question.

Does anyone object to this, or do we have consensus?

We do not have consensus, but I will obviously accept the proposed outcome.

mdempsky commented 4 years ago

My comment a week ago was proposing to define that fields are compared in order and that any unequal comparison stops the overall comparison. Does anyone object to this, or do we have consensus?

My first preference is still to leave it explicitly unspecified, and allow comparisons of structs/arrays with both non-equal and non-comparable elements to either evaluate false or panic. But my second preference would be to codify the existing (at least up to Go 1.14) practice of implementing in-order comparison with short-circuiting, as you suggest.

I think if we want to start constraining order of evaluation within the spec, there are more fruitful areas for us to start than here. It's been 5+ years since my initial report on the subject, and I'm not aware of any user reports touching on the same. (Granted it probably helps that cmd/compile and gccgo both happen to implement the same non-specified behavior here.)

josharian commented 4 years ago

We should probably revert CLs 230207 and 230208, and re-attempt them (or a modified version of them) if appropriate for 1.16. I am AFK and will be for a few days, and the beta is soon (I hope). Any chance someone would do the reverts for me? Thanks.

randall77 commented 4 years ago

On it.

gopherbot commented 4 years ago

Change https://golang.org/cl/236146 mentions this issue: Revert "cmd/compile: improve generated eq algs for structs containing interfaces"

gopherbot commented 4 years ago

Change https://golang.org/cl/236277 mentions this issue: Revert "cmd/compile: improve equality algs for arrays of interfaces"

gopherbot commented 4 years ago

Change https://golang.org/cl/236278 mentions this issue: Revert "cmd/compile: improve equality algs for arrays of interfaces"

rsc commented 4 years ago

@josharian

You are taking the compiler-writer point of view. I am taking the user point of view.

Sorry, Russ, but I find this unfair and a bit patronizing.

Sorry it came across that way. I was only trying to point out that there are two different sides to this that are always in tension and we were prioritizing the two differently. Apologies.

rsc commented 4 years ago

No change in consensus here, so accepting. Edit: Got confused again, sorry.

rsc commented 4 years ago

Sorry, got confused about where we were.

Based on the discussion above, this sounds like a likely accept.

gopherbot commented 4 years ago

Change https://golang.org/cl/236147 mentions this issue: cmd/compile: test that equality is evaluated in order

randall77 commented 4 years ago

Acutally, Josh, I don't think CL 230207 needed to be reverted. Only 230208 introduces observable changes in interface comparison behavior.

gopherbot commented 4 years ago

Change https://golang.org/cl/236418 mentions this issue: cmd/compile: add interface equality tests

josharian commented 4 years ago

Thanks, Keith.

Acutally, Josh, I don't think CL 230207 needed to be reverted.

I’m still AFK, but I think they make equivalent changes to algs for

type T struct { a, b, c, d, e interface{} }

as to

type T [5]interface{}

randall77 commented 4 years ago

I can't get any ordering violations with CL 230207 reapplied. No hurry, we can reapply if it is safe for 1.16.

josharian commented 4 years ago

OK. Are you using structs with > 4 fields? That might be necessary, since we inline “small” comparisons, and that goes through a different code generation path. (I recently filed an issue to unify them, but can’t find it on my phone.)

martisch commented 4 years ago

(I recently filed an issue to unify them, but can’t find it on my phone.)

cmd/compile: unify and improve struct/array equality comparisons #38674

rsc commented 4 years ago

Retitled. No change in consensus, so accepted.

mdempsky commented 4 years ago

@rsc The previous title mentioned arrays. Did you intentionally omit that from retitling? I think if we're specifying that structs are compared field-by-field in source order, then arrays should also be compared element-by-element in index order.

gopherbot commented 4 years ago

Change https://golang.org/cl/237919 mentions this issue: cmd/compile: fix ordering problems in struct equality

gopherbot commented 4 years ago

Change https://golang.org/cl/237921 mentions this issue: cmd/compile: clean up equality generation

ianlancetaylor commented 1 year ago

Just a note that I think that we made a decision here: we compare structs for equality by comparing field by field, and exiting early when two fields are unequal. But I don't think that decision was ever added to the spec. (@griesemer)

mdempsky commented 1 year ago

To nag again, this issue affects array comparison order too, and it remains unconfirmed whether arrays were intentionally dropped from the accepted retitling on June 10, 2020.