Closed dsnet closed 1 year ago
I wonder if the package path testing/cmp
is appropriate. It's true that many uses will be in tests, but many others will not. And there's nothing about the API that ties it to testing
or test packages. Perhaps reflect/cmp
?
I generally agree with this proposal; the testing
package has been borrowing bits and bops from https://github.com/frankban/quicktest recently such as its Mkdir and Setenv, and I'd love for us to incorporate more of its good ideas. The elephant in the room is its powerful and easy to use comparisons, powered by go-cmp.
Also, have you thought about size? In terms of API and packages (there's also cmpopts), but also when weighing the size of Go in its source and binary forms. A more powerful reflect.DeepEqual
doesn't sound particularly heavy at first glance, but the repository contains 50 files containing ~400KiB in total.
Also cc @rogpeppe @frankban @myitcv
As a big fan of the cmp package, this seems like a good idea to me. The API is stable and well tested.
One thought: with generics only a year or so away, I wonder if we should hold on until we can make the signatures generic:
func Diff[T any](x, y T, opts ...Option) string
func Equal[T any](x, y T, opts ...Option) bool
Although the implementation would still be using reflection under the hood, using a type parameter would make it clear that we're expecting to compare values of the same type, and avoid common mistakes where values of incompatible types are compared, only for the comparison to fail at runtime.
The current behaviour would still be available by using Equal[interface{}]
or Diff[interface{}]
.
I'd also support folding the cmpopts package into the root cmp
package (I always have difficulty remembering which functionality is in which place).
I wonder if the package path
testing/cmp
is appropriate. It's true that many uses will be in tests, but many others will not. And there's nothing about the API that ties it to testing or test packages. Perhapsreflect/cmp
?
When cmp
was first created, making it a general purpose package was one of the sub-goals. However, since cmp
is predominantly used for testing purposes, it chose to use panics for situations where two types just could not be compared. The package's propensity to panicking (and it's relatively slow speed compared to reflect.DeepEqual
) makes it unsuitable for non-testing usages.
Also, have you thought about size? In terms of API and packages (there's also cmpopts), but also when weighing the size of Go in its source and binary forms. A more powerful reflect.DeepEqual doesn't sound particularly heavy at first glance, but the repository contains 50 files containing ~400KiB in total.
Nope. Thanks for raising up the thought. There's some files and functionality that can be removed I believe:
erorrs
package and google.golang.org/x/errors
depending on version.reflect.Value.IsZero
.internal/fmtsort
package.github.com/pkg/diff
, which is good idea, however, there may need to be some extensions to the API in myers
to support a "modify" operation. Currently, that package only reports insertions and removals, but cmp
relies on insertions, removals, and modifications. Furthermore, cmp
's diffing algorithm is guaranteed to run in O(n) and produce a possibly sub-optimal edit script, while myers
may run in O(n^2). Perhaps myers
could take an option to limit how much work it performs before returning a sub-optimal solution, but one with a bounded guarantee on run time? \cc @josharianOne thought: with generics only a year or so away, I wonder if we should hold on until we can make the signatures generic:
Interesting thought. I should also note that generics would help with:
Comparer
and FilterValues
, which must be given a func(T, T) bool
Transformer
, which must be given a func(T) R
I'd also support folding the cmpopts package into the root cmp package (I always have difficulty remembering which functionality is in which place).
Yea, that's been one of my longer term goals. The main reason I haven't done so yet is because adding them in causes the godoc page to be less readable, which is another motivating reason for #44447.
I'd be happy for github.com/pkg/diff to morph into a useful diff package for std. I'd rather it be in golang.org/x, so that it can be used by all and sundry. (I'm frustrated by various gems locked up in std internal.) I cannot dedicate much time to it in the near future, but would be happy to be involved in API design (and contribute the existing code as a starting point). For that, it's probably time for someone (not me) to file a proposal where we can start discussing what we want out of it.
@josharian I've been doing various diff investigations off and on for a few years (sic) and had some time last week to hook a bunch of stuff together. I have rather a lot of thoughts, and an implementation of the the O(N)-space Myers algorithm, among others. I completely agree that we should have a standard diff package separate from testing/cmp. I need to take care of a few higher priority things but I intend to write more next month. I've retitled #23113 to make that clear.
@dsnet My main concern with adding testing/cmp is that it is a very large amount of code and API, especially compared to package testing itself. Just the sheer number of internal packages you listed in https://github.com/golang/go/issues/45200#issuecomment-806082863 scares me.
Is it possible to take this opportunity to simplify at all?
@rsc works for me. I've been thinking about diffs on and off too. :) I look forward to chatting about it when you are ready.
Is it possible to take this opportunity to simplify at all?
@rsc, my main point from https://github.com/golang/go/issues/45200#issuecomment-806082863 was that most of those internal packages can be eliminated since the needed functionality has either been added to the standard library already (e.g., reflect.Value.IsZero
) or already exists as an internal package (e.g., internal/fmtsort
). The most significant internal package is internal/diff
and ideally cmp
does not need to implement its own but make use of another general purpose diffing package. I'll continue the discussion on #23113 on how to generalize the diffing algorithm there to be suitable for cmp
's needs.
My main concern with adding testing/cmp is that it is a very large amount of code
@rsc. If it's any consolation, the logic that's concerned with the semantics of whether two values are equal is around ~900 LOC. The remaining complexity comes from the logic to pretty print the difference, which accounts for ~1800 LOC. Most of that complexity is because the the reporter is a relatively large set of heuristics for how to present the difference in a way that is easiest for humans to interpret. The reporter logic was fine tuned over the past years based on user feedback and has been fairly stable for the past year or so.
I'd also like to mention another possibility: the cmp package has excellent functionality for deep-printing
values, but that's buried in the code and only accessible via cmp.Diff
. Printing a value recursively
is often of great value, but neither of the widely used implementations I'm aware of (spew and pretty)
are great, because it's common to need some customisation to make the output readable (for example
by omitting some fields or truncating certain byte slices).
I wonder if it might be a good idea to consider including some deep-printing functionality in the standard library. The style of the cmp API potentially sets a nice precedent for what a deep printing API might look like.
It would probably be best if it was testing-oriented: like cmp, such a package would probably need to use unsafe to allow it to call methods on values from unexported fields.
Aside: like @dsnet, I also think that cmp would most happily live at testing/cmp
, partly for the above reason.
but neither of the widely used implementations I'm aware of (spew and pretty) are great, because it's common to need some customisation to make the output readable (for example by omitting some fields or truncating certain byte slices).
@rogpeppe I believe sanity-io/litter can do that, if I'm not mistaken.
I completely agree that we should have a standard diff package separate from testing/cmp
@rsc There's also; https://github.com/golang/go/issues/41980 which I think would be related.
I wonder if it might be a good idea to consider including some deep-printing functionality in the standard library. The style of the cmp API potentially sets a nice precedent for what a deep printing API might look like.
I brought this up on the mailing list at one point: https://groups.google.com/g/golang-nuts/c/Tn0QeDv6fU8/m/ukcrSF6BCwAJ
Deep printing was seen as requiring possibly too much customization, leading to a complex API. And a general mechanism to traverse data structures could live outside the standard library.
Edit: related https://github.com/golang/go/issues/28141
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
I use cmp in many of my projects, and would love to see its capabilities become part of the stdlib. But the current package API is unnecessarily complex and esoteric; IMO it would need to be substantially simplified (read: reduced) before it could be realistically considered for inclusion.
But the current package API is unnecessarily complex and esoteric; IMO it would need to be substantially simplified (read: reduced) before it could be realistically considered for inclusion.
I'm interested in what aspects of the API you see as possibilities for simplification. Personally I've see it as an example of a nice composable API and can't see how it could be simplified much without removing a lot of its usefulness.
As I said above, generics could make the API more obvious. Another example, Transformer
could have a generic signature rather than using interface{}
:
func Transformer[T, R any](name string, f func(T) R) Option
I'm interested in what aspects of the API you see as possibilities for simplification.
With the huge caveat that this is all IMO —
I think what you're left with is cmp.Equal, cmp.Diff, some way to toggle cmp.AllowUnexported, and a hook to provide your own comparison function. That feels like about the right size and shape for a package like this, to me. I also feel like that's about 90% of the current package's usefulness, anyway.
edit: It might be worth stating explicitly the problem that a package testing/cmp would be solving. I don't think it's providing a powerful and expressive toolkit for arbitrary value comparisons. I think it's just being a nicer reflect.DeepEqual.
Here's a histogram of all cmp
and cmpopt
usages according to the latest version of all public modules:
36273 cmp.Diff
11808 cmp.Equal
2516 cmp.Comparer
2314 cmp.Option
1929 cmp.AllowUnexported
1680 cmpopts.IgnoreUnexported
1654 cmpopts.IgnoreFields
1509 cmpopts.EquateEmpty
1334 cmp.Options
607 cmp.Transformer
523 cmpopts.SortSlices
369 cmp.Path
256 cmp.Result
243 cmp.FilterPath
242 cmp.Ignore
187 cmpopts.IgnoreTypes
152 cmp.Transform
130 cmp.FilterValues
125 cmpopts.EquateNaNs
92 cmpopts.EquateApprox
91 cmp.Reporter
90 cmpopts.EquateErrors
69 cmpopts.AnyError
66 cmpopts.IgnoreInterfaces
54 cmpopts.AcyclicTransformer
39 cmpopts.EquateApproxTime
39 cmp.MapIndex
38 cmp.PathStep
36 cmpopts.IgnoreMapEntries
31 cmpopts.IgnoreSliceElements
28 cmp.Exporter
26 cmpopts.SortMaps
23 cmp.StructField
13 cmp.SliceIndex
9 cmp.Indirect
8 cmp.TypeAssertion
Some observations:
cmp.{Equal,Diff}
are unsurprisingly the most widely used.cmp.Transformer
is less widely used than cmp.Comparer
, but it's still a non-trivial amount of usages (4:1 ratio).cmpopts.{IgnoreUnexported,IgnoreFields,EquateEmpty}
together account for ~5k usages, which is twice the number of cmp.Comparer
usages.cmp.{Path,PathStep,StructField,SliceIndex,MapIndex,Indirect,TypeAssertion,Transform}
may not be widely used, but they're critical to the operation of cmpopts.{IgnoreUnexported,IgnoreFields,EquateEmpty}
, the other cmpopt
options, and any custom options that a user may write. We can't just remove them, but we could move them to some other package to hide up-front visibility to them. These types are similar to the reflect
package. It's not widely used, but is critical to the operation of widely used packages like encoding/json
or text/template
.It might be worth stating explicitly the problem that a package testing/cmp would be solving. I don't think it's providing a powerful and expressive toolkit for arbitrary value comparisons. I think it's just being a nicer reflect.DeepEqual.
In the initial post, I mentioned that one of the shortcomings of reflect.DeepEqual
was that "it provides no ability to customize the comparison, which is increasingly more useful for more complex types." Certainly there can be a difference in opinion in how much customization should be permitted, but I'm of the (biased) opinion that at least what's currently in cmp
is mostly the right balance. Perhaps, the lesser used functionality (e.g., cmp.Path
, cmp.PathStep
, and the concrete path step types) can be moved elsewhere out of immediate visibility, but not removed.
Certainly there can be a difference in opinion in how much customization should be permitted, but I'm of the (biased) opinion that at least what's currently in cmp is mostly the right balance.
I appreciate that you get value out of what's currently there, but delivering value is just one part of the calculus when deciding whether or not to add a new feature — necessary, but insufficient. The other part is balancing that value with the costs it brings, often to areas like API size and coherence, emergent complexity with other features, and so on.
The (demonstrated!) value of the package is disproportionally weighted in a small part of its API. And the costs it carries are quite high: beyond the very large SLoC, already discussed above, its API is undeniably big, and much of that API is inscrutable without careful study. From where I sit, the calculus just doesn't work — especially not for the stdlib, where the bar is as high as it gets.
(All IMO, of course.)
one of the shortcomings of reflect.DeepEqual was that "it provides no ability to customize the comparison, which is increasingly more useful for more complex types."
Does that customization need to be performed by code that lives in the stdlib? Why is that a better place for it than in x/ somewhere? Or, perhaps better yet, attached to the types that need the customization? Is the need for this kind of customization actually strong and widespread enough that it makes sense to generalize in this way?
much of that API is inscrutable without careful study
From that, I take away something I've felt before too: that the docs could use a bunch more examples, particularly in the less-used parts of the API.
That said, I found that the API generally rewards a bit of study; the underlying model is elegant, which counts for a lot IMHO.
For myself, I can think of two things that have made me ponder parts of the existing API in the past:
I use go-cmp to do the comparison of objects and it's pretty handy for that, but I've found the diffing capabilities lacking, especially for multiline strings, so I'm not sure it is in a state that it should be merged into the stdlib.
Some test frameworks use github.com/pmezard/go-difflib
as it provides much better multiline text diffs. I use it in my own test framework for this reason, and it's also used in the popular stretchr/testify.
If imported, maybe the cmp package should focus on equality only, and not include diffing functionality?
I've found the diffing capabilities lacking, especially for multiline strings
@leighmcculloch, what version of cmp
are you using? Higher fidelity printing of multiline strings has been added since v0.5.0. See this example output.
Does that customization need to be performed by code that lives in the stdlib?
FWIW, as it currently stands, the Option interface is opaque. So any 3rd party implementations must be implemented based on the Options that are already there.
Personally, I think this is a bit unfortunate, but I'm not sure how to fix it. In any case, one obvious answer to "the API surface is too big" would be to make it possible/easy to factor out the lesser used options into x/
or 3rd party packages, as @peterbourgon suggests. I think for that, we'd either need to make Option
a functionally complete interface that allows 3rd party implementations, or find a minimum orthogonal set of Option
s that can serve as a basis (the existing ones are obviously a basis, but they don't seem minimal - especially if you include cmpopts
).
Personally I'm not convinced cmp
should be in the stdlib either. IMO it works just fine as a 3rd party module and I generally agree that the API surface it provides seems a bit unwieldy. I think it might live well in x/
and I think something like it, but a lot smaller could fit into the stdlib. At that point, we're not really talking about "putting go-cmp into the stdlib", but we are talking about "provide a package for better generic comparisons, inspired by go-cmp" and I think we'd need to actually see how that would look, before talking about if it "works" or not.
Oh, also, FWIW: When I used go-cmp in the past, but needed more than cmpopts.SortSlices
or something of that sort, I found it easier to implement the comparison function directly and use that, instead of trying to fit it into cmp.Equal
, which tended to cause a bunch of confusing dynamic type-errors for me. cmp.Diff
provides more bang for your buck (as implementing diff is harder), of course. But I found it telling that the API seemingly makes some things more difficult to do with the package, than without.
In addition to the other issues, I'd note: I frequently want to compare slices such that i consider a nil slice to be equivalent to a non-nil slice of len 0, and reflect.DeepEqual has bitten me on that several times, and worse, done so such that a naive printf of the slices (using %d
if they're []int) doesn't show any difference...
Writing a function that compares two things and tells me whether or how they differ has usually been a great return on time spent in improving test quality and outputs, but I've written that function for []int far more often than I should.
When I used go-cmp in the past, but needed more than cmpopts.SortSlices or something of that sort, I found it easier to implement the comparison function directly
@Merovius. Yea, while cmpopts.SortSlices
allows the ability to sort any slice type, I agree that it's not always easy to use. In the case of a simple type (e.g., int
), it's unfortunate needing to provide an explicit less function when it's trivial. In the case of a more complex type, the comparison can get complicated and may not be aware of any ignored fields. There's been some discussion about having a generic less function, but there are some non-trivial problems to figure out. In most cases, the user doesn't care about the order, but wants to functionally treat a slice as a multi-set.
https://github.com/golang/go/issues/45200#issuecomment-823518866
@seebs, it's not clear to me whether your comment is in support or opposition towards the proposal. Since cmp.Equal
has it's heritage in reflect.DeepEqual
it treats []T(nil)
and []T{}
as inequal. That said, cmp.Diff
does clearly show when the difference is due to nil-ness. Also, the cmpopts.EquateEmpty
option can be used to treat such cases as equal.
@rsc any update on this since https://github.com/golang/go/issues/45200#issuecomment-806171109? Asking in order to avoid duplication of effort with respect to https://github.com/pkg/diff/issues/26. Thanks
This proposal issue has been lingering, in large part because we don't really know how to move forward on it.
go-cmp is widely used but also very complex. I agree with the comments above that if we were to move forward with something in the standard library, we'd want a significantly trimmed down version. I have read the go-cmp package docs a few times over the past few years and each time, I'm left feeling like I don't fully understand what the package does. Obviously the basic functionality is easy to understand, but all the complications are not. And then you also have to read the cmpopts package. At first glance it's not even obvious that cmpopts can be written in terms of cmp. I initially thought that cmpopts was using some internal tricks to return cmp.Options that cmp knew about but weren't exposed in cmp's API itself. Eventually I figured out that cmpopts is entirely layered on top of cmp.Transformer, cmp.FilterValues, and cmp.Comparer, but it's weird that you have to reach for a different package to get the "simple" helpers. Of course, if it was all one package that's even more API to digest.
I believe I met with Joe or Damien or both at one point long ago to discuss potential ways to make cmp fit better into the standard library, but my memory is that there were interface reasons that make it essentially impossible to change any details of cmp without breaking existing uses. I forget the exact details, and maybe I am misremembering.
There may well be some reduced form of google/go-cmp that should become testing/cmp. I don't think an unmodified google/go-cmp is that form. And I don't see any clear path forward as far as what exactly to remove.
It seems like maybe we should move this proposal toward a decline, which would give clarity to the issue and perhaps open a space for other proposals of simpler APIs that might better fit the standard library. Of course, google/go-cmp will remain for anyone who wants it, same as always.
Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
TL;DR, I propose adding
github.com/google/go-cmp/cmp
to the standard library astesting/cmp
.Determining equality of two values is one of the most common operations needed for a unit test where the test wants to know whether some computed value matches some pre-determined expected value.
Using the Go language and standard library alone, there are two primary options:
==
operator from the language itself, andreflect.DeepEqual
function.For simple cases, these work fine, but are insufficient for more complex cases:
==
operator only works on comparable types, which means that it doesn't work for Go slices and maps, which are two common kinds that tests want to compare.reflect.DeepEqual
function is a "recursive relaxation of Go's == operator", but has several short-comings:For many users, the standard library is insufficient, so they turn to third-party packages to perform equality. According to the module proxy, the most common comparison module used is
github.com/google/go-cmp
with 7264 module dependents (and 25th most imported module). I propose includingcmp
in the standard library itself.Why include
cmp
in the standard library?The most widely used comparison function in Go is currently
reflect.DeepEqual
(with ~34k usages ofreflect.DeepEqual
compared to ~6k usages ofcmp.Equal
). Inclusion ofcmp
to the standard library would provide better visibility to it and allow more tests to be written that would have been better off usingcmp.Equal
rather thanreflect.DeepEqual
.A problem with
reflect.DeepEqual
is that it hampers module authors from making changes to their types that otherwise should be safe to perform. Sincereflect.DeepEqual
blindly compares unexported fields, it causes a test to have an implicit dependency on the value of unexported fields in types that come from a module's dependency. When the author of one those types changes an unexported field, it surprisingly breaks many brittle tests. Examples of this occurring was when Go1.9 added monotonic timestamp support and the change to the internal representation oftime.Time
broke hundreds of test at Google. Similar problems occurred with adding/modifying unexported fields to protocol buffer messages.reflect.DeepEqual
is a comparison function that looks like it works wells, but may cause the test to be brittle. Furthermore,reflect.DeepEqual
does not tell you why two values are different, making it even more challenging for users to diagnose such brittle tests.As a contributor to the standard library, there are a number of times that I would have liked to use
cmp
when writing tests in the standard library itself.How is
cmp.Equal
similar or different thanreflect.DeepEqual
?cmp.Equal
is designed to be more expressive thanreflect.DeepEqual
. It accepts any number of user-specified options that may affect how the comparison is performed. Whilereflect.DeepEqual
is more performant,cmp.Equal
is more flexible. Also,cmp.Diff
provides a humanly readable report for why two values differ, rather than simply providing a boolean result for whether they differ.One significant difference from
reflect.DeepEqual
is thatcmp.Equal
panics by default when trying to compare unexported fields unless the user explicitly permits it with ancmp.Exporter
option. This design decision was to avoid the problems observed with usingreflect.DeepEqual
at scale mentioned earlier.Without any options specified,
cmp.Equal
is identical toreflect.DeepEqual
except:cmp.Equal
panics when trying to comparing unexported fieldscmp.Equal
uses a type'sEqual
method if it has onecmp.Equal
has an arguably more correct comparison for graphsreflect.DeepEqual
's cycle detection primarily aims to avoid infinite recursion, but does not necessarily verify whether the two graphs have the same topology, whilecmp.Equal
checks that the graph topology are the same.(Fun fact) Package
reflect
is the 15th most imported Go package, but ~77% of the imports (for test files) only do so to useDeepEqual
. In 2011, Rob Pike explained that "[reflection] is a powerful tool that should be used with care and avoided unless strictly necessary." I find it ironic that most usages ofreflect
is to access a function that is arguably not even about Go reflection (i.e., it doesn't provide functionality that mirrors the Go language itself).How does
cmp
compare to other comparison packages?The only other comparison module (within the top 250 most widely used modules) is
github.com/go-test/deep
with 845 dependents (compared to 7264 dependents forcmp
). Packagedeep
is not as flexible ascmp
and relies on globals to configure how comparisons are performed.