golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

runtime: Append consecutive slices, same underlying array #17530

Closed npat-efault closed 8 years ago

npat-efault commented 8 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.7.3 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64" GOBIN="" GOEXE="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/npat" GORACE="" GOROOT="/usr/local/go" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" CC="gcc" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build296609692=/tmp/go-build -gno-record-gcc-switches" CXX="g++" CGO_ENABLED="1"

What did you do?

It seems that the built-in append does not check for a situation like the following (appending consecutive slices, with the same underlying array) , and needlessly copies data, when it could avoid it.

  func appendBytes(a, b []byte) []byte {
          if len(b) == 0 {
                  return a
          }
          // Shouldn't append() check for something like this?
          if cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] {
                  return a[:len(a)+len(b)]
          }
          return append(a, b...)
  }

If I'm following the breadcrumbs correctly, they lead to the fact that runtime-memmove() does not short-circuit when source and destination addresses are the same (I may be wrong, though).

Running the simple benchmarks at:

https://gist.github.com/npat-efault/f055654633f43d0ef771d38657fd499e

for any value of N, demonstrates this.

What did you expect to see?

Built-in append detecting the trivial case and behaving (performance-wise) like appendBytes above, running time being independent of slice b size

What did you see instead?

Running time increases with slice b size.

cznic commented 8 years ago

I think this is a rarely occurring optimization opportunity. That said, the cost for its detection, paid in all other, more often seen cases, is possibly high.

Any real world examples which would be helped by this?

npat-efault commented 8 years ago

I believe that the cost of runtime-memmove() detecting that source and destination addresses are the same, is minimal (one or two instructions). Almost barely noticeable and only for extremely small slices.

Then again, you are right, it is rarely occurring.

cznic commented 8 years ago

I doubt cap(a) >= len(a)+len(b) && &a[:len(a)+1][len(a)] == &b[0] can compile to one or two instructions.

randall77 commented 8 years ago

I agree that this seems super rare and not worth the code to check for it. If you know you're in this situation (or might be), you can check yourself and reslice to avoid the copy. Closing.

npat-efault commented 8 years ago

@cznic: I don't think you have to do exactly this. This is purely for demonstration purposes (the best I could do with "user code"). All you'd have to do is, in fact, add a CMPQ DI, SI; JEQ move_0 (or something similar, I'm not familiar with the go assembler) to the runtime-memmove() implementation.

If I'm not mistaken.

randall77 commented 8 years ago

Right, memmove could do a pretty simple check. But we're not going to add that absent some real code which cares.

npat-efault commented 8 years ago

I was thinking of code along the lines of this:

    parsed := src[0:0]
    for len(src) > 0 {
        n := keep(src)
        parsed = append(parsed, src[:n]...)
        src = src[n:]
        n = drop(src)
        src = src[n:]
    }

Arguably it may not be the best way to code something like this, but it is not totally unnatural. In this case, if drop() returns 0, you should not have to pay for the extra copy...

Anyway, as you said, it can be avoided.

I just thought I'd point-out the issue (and the fact that it can easily be fixed). Whether something should be done about it, is a judgement call, and is up to you...

npat-efault commented 8 years ago

Fixed a couple of typos in the example code of my previous comment

nigeltao commented 8 years ago

You could write it like this:

// r and w are read and write cursors for src.
r, w := 0, 0
for r < len(src) {
    n := keep(src[r:])
    if r != w {
        copy(src[w:], src[r:r+n])
    }
    r, w = r+n, w+n
    r += drop(src[r:])
}
parsed := src[:w]
npat-efault commented 8 years ago

@nigeltao Yes, of course I could. And there are probably other ways, as well. The thing is, the code I posted was the first thing that came to my mind (instinctively). And immediately I though: "I guess append() will be clever enough to avoid the copy and just do a re-slice (extend)". I checked, and it turns out it isn't. Which was a bit surprising. Especially considering that the cost of detecting the (uncommon) special case turns out to be negligible. I think (and this is a judgement call) that doing "the right thing" in this case justifies the (almost negligible) cost.

npat-efault commented 8 years ago

@nigeltao Also, in my original case, parsed was passed as an argument, and it could, or could not, "coincide" with src. Yes you could also handle this with some extra code, but this is besides the point... (it's the "asymmetry" that matters)

sprstnd commented 8 years ago

so redundant compiler/runtime checks are fine, but this extra check is a concern? poppycock!

minux commented 8 years ago

Whether the cost is negligible or not is the deciding factor for adding a special case.

If the case only happens one in a million times, then no matter how cheap the optimization is, we shouldn't add it.

And the optimization cost is non-negligible anyway as it will at least cost one more entry in the branch predictor tables.

That being said, I instrumented the memmove routine and found that runtime.typedmemmove actually does trigger runtime.memmove with src == dest. Perhaps it's worthwhile to investigate whether we can eliminate such unnecessary memmove calls.

minux commented 8 years ago

self memmove is not rare during all.bash, but the count is almost never larger than 0x20 bytes.

Most of the cases come from math/big, where the API always takes a result parameter, and in a lot of cases, the final step is to copy the result into the result parameter.

We will need a more extensive benchmark analysis to see whether memmove should special case this.

npat-efault commented 8 years ago

Ah, I see! The same memmove is used in many cases, other than slice-copy, slice-append, and the likes. A lot of them may be very small-size moves, where the cost of adding the check could be considerable (or maybe not). We could consider adding the check only for slice appends (which is more complicated that adding a simple compare to memmove). In this case, though, it would mean adding a couple of extra instructions at every append call site... not an obvious gain.

npat-efault commented 8 years ago

What if we add the test to memmove for all but the smallest sizes? Just a thought.

randall77 commented 8 years ago

Adding the test above a size threshold would certainly bound the overhead of the check.

On Tue, Oct 25, 2016 at 12:53 PM, Nick Patavalis notifications@github.com wrote:

What if we add the test to memmove for all but the smallest sizes? Just a thought.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/17530#issuecomment-256156593, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkgIFHP3HmXrnZ-X2DxuPUP4y6Y0pLCks5q3l4kgaJpZM4KcYDI .