golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.65k stars 17.42k forks source link

cmd/compile: intrinsify strings.Compare #61725

Closed dsnet closed 3 months ago

dsnet commented 12 months ago

A comment by @rsc in strings.Compare says:

This function does NOT call the runtime cmpstring function, because we do not want to provide any performance justification for using strings.Compare. Basically no one should use strings.Compare. As the comment above says, it is here only for symmetry with package bytes. If performance is important, the compiler should be changed to recognize the pattern so that all code doing three-way comparisons, not just code using strings.Compare, can benefit.

I'm not sure this comment has aged well with the recent change to slices.SortFunc, where it now takes in an func(T, T) int instead of func(T, T) bool. This means that the utility of strings.Compare suddenly shot up.

We should either intrinsify strings.Compare as runtime.cmpstring OR (even better) recognize this pattern. That way, cmp.Compare on string types will benefit as well.

zigo101 commented 12 months ago

https://github.com/golang/go/issues/50167, https://github.com/golang/go/issues/58142, https://go101.org/blog/2022-10-01-three-way-string-comparison.html

dr2chase commented 12 months ago

@golang/compiler

yuryu commented 10 months ago

Can we bring back this removed optimization? I'm hitting the same issue. slices.SortFunc needs a three-way comparison and there's no good way to do it right now.

https://go-review.googlesource.com/c/go/+/3012

ncruces commented 10 months ago

Another vote for something that recognizes cmp.Compare and/or sequences of cmp.Less.

They seem to have been written in such a way that the optimizer can throw away all the NaN handling, yay! πŸŽ‰

But cmp.Compare gets particularly bad for strings: it doesn't even prioritize ==, like strings.Compare does.

Or should we also document cmp.Compare to say no one should call it? 🫀

gopherbot commented 10 months ago

Change https://go.dev/cl/532195 mentions this issue: strings: intrinsify and optimize Compare

ncruces commented 10 months ago

Good. Yet, I'm a bit sad that this means giving up on Ordered having a decent implementation that's any good for strings.

It's a bit stunning to, for years, take the stance that you don't really need three-way compares, then build an entire 2nd search/sort library around three-way compares, and then fail to optimize the handful Ordered types for it, in effect pushing non-generic callbacks over operators for performance.

There are cases where three-way compares are not something to avoid, and leaving Ordered out of those is… sad.

ncruces commented 10 months ago

If we're going to give up and not optimize string comparisons, and if we can't intrinsify/specialize cmp.Compare, I'd at least change the implementation to this:

func Compare[T Ordered](x, y T) int {
    if x == y {
        return 0
    }
    if x < y {
        return -1
    }
    if isNaN(x) {
        if isNaN(y) {
            return 0
        }
        return -1
    }
    return +1
}

It's slightly worse for ints, slightly better for floats (both hard to measure), and better for strings (mostly equivalent to the previous strings.Compare).

This is sorting a 10 million random element slice, with slices.SortFunc:

Benchmark_int_old-12            10000000               134.6 ns/op
Benchmark_int_new-12            10000000               138.5 ns/op
Benchmark_float_old-12          10000000               146.5 ns/op
Benchmark_float_new-12          10000000               143.4 ns/op
Benchmark_string_old-12         10000000               680.4 ns/op
Benchmark_string_new-12         10000000               596.4 ns/op
Benchmark_string_strings-12     10000000               588.2 ns/op

Edit: what would be a good benchmark so I could send this CL in?

gopherbot commented 10 months ago

Change https://go.dev/cl/531997 mentions this issue: cmp: optimize Compare

earthboundkid commented 9 months ago

With the existing slices.SortFunc and the coming cmp.Or, strings.Compare is more useful than it was pre-generics.

type Order struct {
    Product  string
    Customer string
    Price    float64
}
orders := []Order{ /* ... */ }
// ORDER BY Customer ASC, Product ASC, Price DESC
slices.SortFunc(orders, func(a, b Order) int {
    return cmp.Or(
        strings.Compare(a.Customer, b.Customer),  // could also be cmp.Compare if it can be equally optimized
        strings.Compare(a.Product, b.Product),
        cmp.Compare(b.Price, a.Price),
    )
})
yuryu commented 9 months ago

I've updated examples to use cmp.Or as well as strings.Compare in http://go.dev/cl/532195. It's waiting for reviews.

ncruces commented 9 months ago

I imagine you guys are fond of cmp.Or, but if we're discussing performance, I guess it should be noted that cmp.Or is not a short circuiting operator.

It's, curious that http://go.dev/cl/532195 changes ExampleSortFunc_multiField to use cmp.Or, but uses an if chain in BenchmarkSortFuncStruct to then claim performance.

PS: someone should go change package cmp's description, given cmp.Or has little to do with "comparing ordered values".

yuryu commented 9 months ago

I just forgot to update the benchmark and didn't mean anything else. I can revert my change to not use cmp.Or later if cmp.Or is not short circuiting. It's a bit disappointing tbh because it looks very clean.

ncruces commented 9 months ago

I'm all for optimizing strings.Compare and cmp.Compare, both as best as possible. And if strings.Compare can be made faster today, and cmp.Compare can't, so be it.

But I disagree with having package cmp push users to use strings.Compare for performance over clarity.

We go from one extreme (dissuading users from using strings.Compare by making it slow almost on purpose), to the other (recommending they go change every usage of cmp.Compare).

earthboundkid commented 9 months ago

I agree that since cmp.Or isn’t short circuiting, it is a little weird to use it with strings.Compare to try to eke out a little performance that you just threw away. I still think it’s a pretty elegant way to write a struct comparison function though.

aclements commented 9 months ago

@dr2chase , this may be an interesting thing to think about with the pure function analysis you've been experimenting with. If we knew or could tell that strings.Compare/cmp.Compare are pure, could we effectively compile a call to cmp.Or with a bunch of comparison arguments as a short circuited operation? Or maybe that whole approach is too implicit.

aead commented 9 months ago

To summarize the situation:

Go does not compare strings in an optimal way. This is especially visible when sorting. As of now, the documentation in the sort packages incorrectly refers to the slices package to be more performant. In general, there are multiple ways to do the same thing and it is not clear/obvious what the performance differences are and which one to choose.

sort vs. slices

Consider the following benchmark:

func BenchmarkSort_10000(b *testing.B) {
    const N = 10000
    items := make([]string, N)
    for i := range items {
        items[i] = strings.Repeat("a", 100) + strconv.Itoa(i)
    }

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        // slices.Sort(items)
        sort.Strings(items)
    }
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
             β”‚ sort_slices.txt β”‚          sort_strings.txt          β”‚
             β”‚     sec/op      β”‚   sec/op     vs base               β”‚
Sort_10000-8       95.26Β΅ Β± 3%   66.23Β΅ Β± 1%  -30.48% (p=0.002 n=6)

Using sort.Strings is ~30% faster than slices.Sort even though sort.Strings says:

// Strings sorts a slice of strings in increasing order.
//
// Note: consider using the newer slices.Sort function, which runs faster.

The perf. difference is caused by how slices.Sort compares strings. It uses a three-way comparison (x == y, x < y and y > x) while sort.Strings just uses a single less than (x < y).


slices.SortFunc

The same performance problem appears when using slices.SortFunc. Consider the following benchmark:

func BenchmarkSortFunc_10000(b *testing.B) {
    type Item struct {
        Value string
    }

    const N = 10000
    items := make([]*Item, N)
    for i := range items {
        items[i] = &Item{
            Value: strings.Repeat("a", 100) + strconv.Itoa(i),
        }
    }

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        slices.SortFunc(items, func(a, b *Item) int {
            return strings.Compare(a.Value, b.Value)
        })
    }
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 β”‚ sort_func_strings.txt β”‚
                 β”‚        sec/op         β”‚
SortFunc_10000-8             94.39Β΅ Β± 1%

The combination of slices.SortFunc and strings.Compare is also ~30% slower than sort.Strings for the same reason as above. See strings.Compare

    // NOTE(rsc): This function does NOT call the runtime cmpstring function,
    // because we do not want to provide any performance justification for
    // using strings.Compare. Basically no one should use strings.Compare.
    // As the comment above says, it is here only for symmetry with package bytes.
    // If performance is important, the compiler should be changed to recognize
    // the pattern so that all code doing three-way comparisons, not just code
    // using strings.Compare, can benefit.

Further, representing strings as []byte slices and sorting them via bytes.Compare is faster than slices.Sort and slices.SortFunc` with strings:

func BenchmarkSortFunc_10000(b *testing.B) {
    type Item struct {
        Value []byte
    }

    const N = 10000
    items := make([]*Item, N)
    for i := range items {
        items[i] = &Item{
            Value: []byte(strings.Repeat("a", 100) + strconv.Itoa(i)),
        }
    }

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        slices.SortFunc(items, func(a, b *Item) int {
            return bytes.Compare(a.Value, b.Value)
        })
    }
}
goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 β”‚ sort_func_strings.txt β”‚        sort_func_bytes.txt         β”‚
                 β”‚        sec/op         β”‚   sec/op     vs base               β”‚
SortFunc_10000-8             94.39Β΅ Β± 1%   67.97Β΅ Β± 4%  -27.99% (p=0.002 n=6)

However, this is only the case when sorting existing []byte slices. Converting the strings to slices within the comparison function is significantly slower:

goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 β”‚ sort_func_strings.txt β”‚       sort_func_castbytes.txt        β”‚
                 β”‚        sec/op         β”‚    sec/op     vs base                β”‚
SortFunc_10000-8             94.39Β΅ Β± 1%   555.39Β΅ Β± 1%  +488.38% (p=0.002 n=6)

bytes.Compare vs. strings.Compare vs. cmp.Compare

The generic cmp.Compare function is even slower than strings.Compare, and therefore, also slower than bytes.Compare.

goos: darwin
goarch: arm64
pkg: aead.dev/compare
                 β”‚ sort_func_bytes.txt β”‚       sort_func_strings.txt        β”‚          sort_func_cmp.txt           β”‚
                 β”‚       sec/op        β”‚   sec/op     vs base               β”‚    sec/op     vs base                β”‚
SortFunc_10000-8           67.97Β΅ Β± 4%   94.39Β΅ Β± 1%  +38.87% (p=0.002 n=6)   176.96Β΅ Β± 2%  +160.35% (p=0.002 n=6)

Possible solutions

While I agree with @rsc that the "best solution" would be the compiler recognizing the three-way comparison pattern for strings and emitting an memcmp/stringcmp intrinsic, I do not agree that strings.Compare should not be used and should be intentionally slower. A common use case for strings.Compare is sorting structs based on one of their inner fields:

type Person struct {
   Name string
   Age uint
}

The current situation is kind of unfortunate since:

As of now, the fastest way to sort a string slice is to use sort.Strings and there is no good solution for sorting structs based on one of their inner string fields using slices.SortFunc or sort.Slice. Both >= ~30% slower than sort.Strings.

ncruces commented 9 months ago

We should make both as fast as possible, and stop recommending people use either because of performance-over-clarity.

eliben commented 8 months ago

To summarize the situation:

Go does not compare strings in an optimal way. This is especially visible when sorting. As of now, the documentation in the sort packages incorrectly refers to the slices package to be more performant. In general, there are multiple ways to do the same thing and it is not clear/obvious what the performance differences are and which one to choose.

sort vs. slices

[...]

I also want to add that I see some platform-dependent performance here; taking the correct string sorting benchmark from stdlib, I get better performance for slices.Sort on amd64 Linux (about 15% faster), but worse performance on arm64 darwin (about 50% slower).

aead commented 8 months ago

I also want to add that I see some platform-dependent performance here

I suspect that this happens because the sorting implementation in slices is, in general, faster than sort. With random strings, the difference between a single comparison and the three way comparison becomes blurry as well. The string comparison short-circuits on the first non-matching character. Hence, comparing random strings may be a good benchmark for sorting in general but it makes the performance penalty of comparing strings twice less visible.

Directly comparing strings.Compare and bytes.Compare using slices.SortFunc with the linked benchmark on arm64 darwin:

goos: darwin
goarch: arm64
pkg: aead.dev/sort
                    β”‚ /tmp/bytes.txt β”‚
                    β”‚     sec/op     β”‚
SlicesSortBytes-8        15.98m Β± 1%
SlicesSortStrings-8      16.50m Β± 0%
geomean                  16.24m

Sorting strings is slightly slower then sorting strings represented as []byte.

Adding a common prefix to all strings / byte slices increases the perf. difference:

goos: darwin
goarch: arm64
pkg: aead.dev/sort
                    β”‚ /tmp/bytes.txt β”‚
                    β”‚     sec/op     β”‚
SlicesSortBytes-8        28.20m Β± 2%
SlicesSortStrings-8      33.96m Β± 1%
geomean                  30.95m
aead commented 8 months ago

A better benchmark for this issue may be:

func BenchmarkIsSorted_1000(b *testing.B) {
    const N = 1000
    items := make([]string, N)
    for i := range items {
        items[i] = strings.Repeat("a", 100) + strconv.Itoa(i)
    }
    slices.Sort(items)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        // slices.IsSorted(items)
        sort.StringsAreSorted(items)
    }
}

The slices package uses cmp.Less while sort uses the Less function. The former compares two strings three times while the later compares them only once.

goos: darwin
goarch: arm64
pkg: aead.dev/sort
            β”‚ /tmp/sort.txt β”‚          /tmp/slices.txt           β”‚
            β”‚    sec/op     β”‚   sec/op     vs base               β”‚
Sort_1000-8     6.909Β΅ Β± 8%   9.573Β΅ Β± 2%  +38.56% (p=0.002 n=6)
randall77 commented 8 months ago

The former compares two strings three times while the later compares them only once.

This should no longer be true after https://go-review.googlesource.com/c/go/+/503795

ncruces commented 8 months ago

It still compares them twice, and none of those 2 takes advantage of length being different to short circuit.

I tried to address that in https://go.dev/cl/531997 but results are inconsistent.

randall77 commented 8 months ago

It still compares them twice, and none of those 2 takes advantage of length being different to short circuit.

My point is that slices.Sort[string], which uses cmp.Less[string] under the hood, looks at each string once, not three times. It is still true that slices.SortFunc[string], using cmp.Compare[string] as the comparison function, looks at each string twice (and not, as it was pre 503795, four times).

zigo101 commented 8 months ago

Note that, since toolchain v1.19, for unknown reasons, slice related benchmarks are often slice length sensitive. So it is best to use several different slices lengths in benchmarks.

gopherbot commented 3 months ago

Change https://go.dev/cl/578835 mentions this issue: cmd/compile: remove redundant calls to cmpstring