Open MichaelTJones opened 6 years ago
Since you have the code handy, can you provide a working example that includes a benchmark?
i can't share that particular code...but i can and will build a test to share
On Sun, Mar 4, 2018 at 12:24 PM, Giovanni Bajo notifications@github.com wrote:
Since you have the code handy, can you provide a working example that includes a benchmark?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/24240#issuecomment-370260710, or mute the thread https://github.com/notifications/unsubscribe-auth/AHgypbiFJkzX-7c-COnX95uFPgAafJqXks5tbE1jgaJpZM4SbYI5 .
-- Michael T. Jones michael.jones@gmail.com
When n is negated, if that's the style chosen to re-write the loop, then look out for n starting off as the smallest possible value as two's complement representation is asymmetrical.
Isn't the decrement of i followed by a jz (or jnz) another way to write this. for i := n; i != 0; i-- { }
Writing assembly for various architectures, a pre-decrement test against zero is the normal way I'm used to seeing, yes. But I wanted to warn against a flaw in the OP's suggestion.
thank you for the advice.
On Sun, Mar 4, 2018 at 2:41 PM, Ralph Corderoy notifications@github.com wrote:
Writing assembly for various architectures, a pre-decrement test against zero is the normal way I'm used to seeing, yes. But I wanted to warn against a flaw in the OP's suggestion.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/24240#issuecomment-370271099, or mute the thread https://github.com/notifications/unsubscribe-auth/AHgypXq7fJVfP__QGo8lu4smKk9MLp6Fks5tbG18gaJpZM4SbYI5 .
-- Michael T. Jones michael.jones@gmail.com
Something else to ponder in implementing this... I hear some folk use a debugger; might they try and inspect the loop's variable and be confused that it's changed sign, or running in the opposite direction? Or perhaps DWARF can represent the transformation for the debugger to reverse? If other compilers have implemented this for other languages then they might give ideas.
OTOH, stepping through C code that's been built withgcc -O3
is pretty painful. It's fine as long as you're reading the assembly and not trying to match it up to the C code that produced it.
Ok, here is a simplified and extracted example. I called it "loop_test.go"
https://play.golang.org/p/uRdymTPtTCi
Alas, i don't see such a big difference in this case. The code is not the same, not so much register pressure.
Summary: maybe this is not so important.
I think it's still worth trying to measure gains across more tests, especially as some platforms have more register pressure than others. But I've also a niggling recollection that this has been suggested before and turned down because of debugger confusion, probably by @rsc. :-)
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?master (1.11 dev)
Does this issue reproduce with the latest release?
this is a compilation strategy proposal
What operating system and processor architecture are you using (
go env
)?amd64 / Darwin (macbook pro)
What did you do?
I have code that iterates without referencing the loop variable. Changing that from:
for i := 0; i < n; i++ { // no access to i }
...to...
for i := -n; i < 0; i++ { // no access to i }
saves me 4%-5% in my test cases.
Looking at the .S files, I see that there are two gains (as expected): one is that the loop limit needs only be addressable and referenced at the start of the loop--which frees a register over the body of the loop; and also, because the test is against zero and there are dedicated instructions for that rather than loading immediate literals or otherwise.
It seems (from afar) that there could be a rule along the lines of:
if loop variable not referenced in body and assignment has block structure (i:=) and type of loop variable is signed and incr RHS does not depend on additive offset to variable (i++ ok, i*=2 not), then transform "for v := low; v OP high; incr" into "for v := low-high; v OP 0; incr"
I don't have any comprehensive data about how common this situation may be, but it does help.
What did you expect to see?
no change because i expected the compiler to be doing this.
What did you see instead?
5% speedup