cmd/compile: switch to a register-based calling convention for Go functions

aclements commented 4 years ago

I propose that we switch the Go internal ABI (used between Go functions) from stack-based to register-based argument and result passing for Go ~1.16~ 1.17.

I lay out the details of our proposal and the work required in this document.

The ABI specification can be found here.

This was previously proposed in #18597, where @dr2chase did some excellent prototyping work that our new proposal builds on. With multiple ABI support, we’re now in a much better position to execute on this without breaking compatibility with the existing body of Go assembly code. I’ve opened this new issue to focus on discussion of our new proposal.

/cc @dr2chase @danscales @thanm @cherrymui @mknyszek @prattmic @randall77

An incomplete and evolving list of tasks:

[X] Define ABI specification (CL, @aclements)
[X] Add GOEXPERIMENT=regabi and asm macro (CL)
[x] cmd/compile: late call lowering (CLs)
[x] cmd/compile: implement register arguments (@dr2chase)
- [x] spill around morestack call (CL)
- [x] add temporary ABI pragma for testing (CL)
- [x] add abstract argument registers (WIP CL)
- [x] implement ABI rules (CL)
- [x] use abstract argument registers
[x] cmd/compile: implement register returns
- [x] late return lowering (CL, @dr2chase)
- [x] implement ABI rules (CL, @thanm)
[x] cmd/compile: ABI wrappers (CL, old CL)
- [x] cmd/link: mangle wrapper names for ELF (CL)
- [x] cmd/link: mangle wrapper names for PE (CL)
- [x] implement zeroing of X15 in ABI0->ABIInternal wrappers (CL)
[X] cmd/asm: add syntax for runtime packages to reference ABIInternal symbols (CL)
- [x] Mark runtime.{goexit,asyncPreempt}, reflect.{makeFuncStub,methodValueCall} definitions ABIInternal (CL)
- [x] Mark reference to runtime.callbackWrap in sys_windows_amd64.s as ABIInternal (CL)
[x] runtime: fix assembly closure calls (mcall) (@aclements, CL)
[x] reflect: support calling ABIInternal (CL)
[x] reflect: support MakeFunc and method value calls (CL)
[x] go and defer statements
- [x] cmd/compile: desugar go/defer to zero-argument (@thanm, CL)
- [x] cmd/compile: fix defer recover() (@cherrymui, CL)
- [x] cmd/compile: desugar go/defer with results (@cherrymui, CL)
[X] cmd/cgo: decouple cgo callbacks from ABI (CL)
[x] runtime: finalizer support (CL)
[x] runtime: Windows callback support (CL)
[x] cmd/internal/obj: introduce and use RegEntryTmp registers in prologue (@mknyszek, CL)
[x] cmd/compile: generate correct ABI0 argument maps for body-less functions (see WriteFuncMap) (@dr2chase, CL)
[x] cmd/compile: make cgo_unsafe_args generate an ABI0 function (or an ABIInternal-with-no-registers function) (@cherrymui, CL)
[x] runtime: update debug call protocol (CL)
[x] Fix ABI0/ABIInternal confusion in cgo (CL)
[x] Run signature fuzzer and fix bugs (everyone)
[x] Get all.bash to pass on Linux, Windows, and Darwin (everyone)

High-priority non-critical path

[x] Export ABI information across package boundaries to eliminate more wrappers (@aclements, CL)
[x] Make funcPC always return the "native" PC of a function (maybe also introduce ABIOther) (@cherrymui, CL)
[x] Strip unnecessary ABIInternal annotations once funcPC is in
[x] Tracebacks
- [x] Define extended argument metadata format (in progress, @cherrymui)
- [x] cmd/compile: produce extended argument metadata (@cherrymui, CL)
- [x] runtime: consume extended argument metadata (@cherrymui, CL)
- [x] cmd/compile,runtime: traceback metadata for spill slot liveness
[x] DWARF
- [x] cmd/compile: ensure arguments have appropriate DWARF locations
- [x] Ensure GDB and Delve can find register arguments
- ~[ ] cmd/compile: add a DWARF vendor attribute with function argument frame size (context)~
[x] Wrapper mangling for non-critical arches
- [x] cmd/link: mangle wrapper names for Mach-O (CL)
- [x] cmd/link: mangle wrapper names for xcoff
- [x] cmd/link: mangle wrapper names for Plan 9
[x] runtime: simplify go/defer call paths
- [x] assert that go/defer frames are zero sized (CL)
- [x] replace defer reflectcall with a direct closure call (CL)
[x] cmd/internal/obj: reject splittable ABIInternal functions without morestack spill info (e.g., asm functions) because we can't generate a correct morestack path (CL)
[x] cmd/asm: don't reference args_stackmap in ABIInternal functions (because it's for ABI0) (CL)
[x] Port performance-critical assembly to ABIInternal (see CL 296372 for valuable functions) (CL 308931, CL 310184)
[x] math: fix overhead of assembly->Go calls CL 310331
[x] Release notes
- Mention undocumented behavior we've observed applications depending on that may have changed.
- Using reflect.ValueOf(fn).Pointer() to get the PC of an assembly function will now return the PC of the ABI wrapper
- Performance surprise: calling from assembly to Go is slightly more expensive.
- Performance surprise: calling an assembly function via a closure is slightly more expensive.
- Traceback format is much improved

Enabling steps

[x] Enable regabiwrappers by default on amd64 (Linux, Windows, and Darwin)
[x] Enable regabidefer
[x] Enable regabireflect
[x] Enable regabig
[x] Enable regabiargs

Testing

[ ] Function signature fuzzer (ongoing)
- [ ] Static call, closure call, method call, interface call, reflect.{ValueOf(target),MakeFunc(x, target),Method(x)}.{Call,Interface}, called from defer, called from go, called as a finalizer
- [x] {Big,Small} {argument,result} in {memory,registers} and {not addressed,addressed and not leaked to heap,addressed and leaked to heap}
- [x] Use defer in a test function (check arguments, modify results)
- [x] Pass pointers-to-stack and pointers-to-heap
- [x] Cause result to move to heap
- [x] runtime: add MoveStackOnNextCall, assertions for pointer-to-stack/pointer-to-heap, assertions for live/dead (@aclements, CL)
- [x] Integrate runtime testing hook
- [ ] Clean up fuzzer, and get into tools repo.
[x] cmd/compile: revive clobberdead (CL)
[x] cmd/compile: add clobberdeadreg (CL)

Post-MVP

[x] Move g from TLS to register (CL)
- [x] Use g register on non-Linux OSes (see CL)
[x] Reserve and use X15 as zero register (CL)
- ~[ ] Should this be X0 for more compact instruction encoding?~
[ ] Port to other platforms
- [x] amd64
- [x] arm64
- [ ] mips
- [x] ppc
- [ ] s390x
- [x] risc-v
- ~[ ] porting guide~
[x] Eliminate legacy ABI paths for all platforms, even if they're using ABIInternal with 0 registers
[ ] Eliminate spill slots?
[ ] Pass pointer-shaped method receiver in context register (#44827)
[x] runtime: simplify go/defer call paths
- [x] simplify go/defer paths to assume zero args/results
- [x] simplify special _defer allocator and remove special _defer cases from mallocgc (because _defer records will be fixed size).
[ ] cmd/vet: add syntax and checks for register arguments and results in assembly like x+0(FP)
[ ] cmd/vet: check argument size for ABIInternal functions and fix runtime asm declarations

Cleanup (can be done later)

[ ] runtime: port all assembly -> Go calls to ABIInternal?
[x] Eliminate ABIAlias support from cmd/compile, cmd/link, and object format
[x] remove now-redundant fields from ssa.AuxCall
[x] rationalize use of LocalsOffset/FrameOffset (for registers that use a link register, e.g. arm64, powerppc).
[ ] cmd/compile: revisit abiutils.go and clean up the API (also reduce ABIConfig copying)
[ ] cmd/internal/obj: generate stack maps in obj so we can better compact the morestack/body maps and reduce subtlety?
[x] cmd/compile,runtime: re-enable passing arguments to go in runtime (context)
[ ] cmd/compile: always attach ir.Func to function ir.Names (context)
[x] cmd/compile: support ABIInternal for tail calls (context)
[x] cmd/compile: once wrapping of defers is on for all arch/OS combinations, it should be possible to simplify or remove some of the openDefer processing in ssagen (notablly we could get rid of openDeferSave()
~[ ] runtime: think about an ABI-insensitive debugCall~ Delve uses DWARF information
[ ] Fix up named slots & value association
[ ] fix firstpos in ssa.go – sometimes it's late.

JetSetIlly commented 3 years ago

@mknyszek so does it mean that register-based calling convention is delayed until Go 1.18?

Go 1.17 will enable the register abi for GOARCH=amd64 systems. The 1.18 activity to which @mknyszek is referring is the work to enable the register abi for other architectures (for example, arm64). The new 1.18 milestone for this issue is for that work.

Is there any work that can be done here for the wasm architecture or is that out of scope?

thanm commented 3 years ago

Is there any work that can be done here for the wasm architecture or is that out of scope?

My understanding was that the WASM abstract machine is stack-based... so I don't think think there is any work to do there (the register ABI effort is for architectures that have actual machine registers). Go wasm experts, please correct me if I am wrong.

gopherbot commented 3 years ago

Change https://golang.org/cl/322630 mentions this issue: cmd/compile: place reg spills after OpArg{Int,Float}Reg ops

gopherbot commented 3 years ago

Change https://golang.org/cl/322631 mentions this issue: cmd/compile: improve debug locations partially live in-params

laboger commented 3 years ago

Is there anything that could be done to help with PPC64?

dr2chase commented 3 years ago

I think the plan is to do ARM64 "cleanly" (i.e., without all the fumbling that accompanied AMD64) and then we end up with a bunch of CLs that you can use for a model. Cherry's started on the compiler-end earlier this week; we have tests that can be enabled step-by-step.

We figured y'all would be interested. 😀

gopherbot commented 3 years ago

Change https://golang.org/cl/323289 mentions this issue: doc/go1.17: add release notes for register ABI

gopherbot commented 3 years ago

Change https://golang.org/cl/325109 mentions this issue: [dev.typeparams] cmd/compile: allow go'd closure to escape when compiling runtime

gopherbot commented 3 years ago

Change https://golang.org/cl/325110 mentions this issue: [dev.typeparams] runtime: undo go'd closure argument workaround

laboger commented 3 years ago

I think the plan is to do ARM64 "cleanly" (i.e., without all the fumbling that accompanied AMD64) and then we end up with a bunch of CLs that you can use for a model. Cherry's started on the compiler-end earlier this week; we have tests that can be enabled step-by-step.

What are the plans for enabling other targets? I have started some work to get this enabled for ppc64le/ppc64 and some packages work but others do not. I tried to follow the list of CLs for ARM64 but must not have them all.

mknyszek commented 3 years ago

I'm planning on writing up a porting guide based on the ARM64 CLs that landed. @laboger What makes you say that you don't have them all?

laboger commented 3 years ago

I'm planning on writing up a porting guide based on the ARM64 CLs that landed. @laboger What makes you say that you don't have them all?

I started working on this before there was a hashtag so there was some guesswork on my part. I was missing a few but have added them and I am still getting failures. Right now I'm trying to debug the problems...

laboger commented 3 years ago

@mknyszek I have the regabi patches for ppc64 working except for one remaining issue that is related to the use of float32 arguments when using reflection. It seems to be related to the memmoves in reflect/value.go when moving from arguments of type float32 to the regabi structure containing only float64 types, which are then later loaded as float64, resulting in wrong values for a few tests. I tried to do some float conversions but that caused other failures.

mknyszek commented 3 years ago

@laboger Based on the ppc64 FP docs I could find, it looks like every float loaded into a register is promoted, which is different from how both amd64 and arm64 work (AFAICT), which leaves single-precision floats in the lower parts of the register. If I recall correctly, the reflect code is written assuming that.

How does ppc64 deal with storage of 32-bit floats in that case? I assume there's some kind of instruction for the conversion into a GP register, or a conversion+store instruction?

In theory, promoting 32-bit floats before placing them in RegArgs should be sufficient. RegArgs is intended to be treated as exactly the data that would go into a register.

I'm happy to help if you have instructions to reproduce.

laboger commented 3 years ago

I have started to submit the changes to enable this for ppc64/ppc64le. Would you be willing to help review? The changes would still be under control of GOEXPERIMENT until they are all merged and this one problem gets fixed. In the meantime I will try to reproduce the issue on a smaller testcase.

When a float32 is loaded from storage then it becomes a float64 in the register. But it seems like what is happening here is that the memove is copying 4 bytes from a float32 in storage to 4 bytes of a float64 in storage and then tries to load the storage as if it is a float64 but I believe the formats of float32 and float64 are different when they are in storage. With a small reproducer I can verify that theory.

On Thu, Aug 26, 2021 at 3:28 PM Michael Knyszek @.***> wrote:

@laboger https://github.com/laboger Based on the ppc64 FP docs I could find, it looks like every float loaded into a register is promoted, which is different from how both amd64 and arm64 work (AFAICT), which leaves single-precision floats in the lower parts of the register. If I recall correctly, the reflect code is written assuming that.

How does ppc64 deal with storage of 32-bit floats in that case? I assume there's some kind of instruction for the conversion into a GP register, or a conversion+store instruction?

In theory, promoting 32-bit floats before placing them in RegArgs should be sufficient. RegArgs is intended to be treated as exactly the data that would go into a register.

I'm happy to help if you have instructions to reproduce.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/40724#issuecomment-906720733, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACH7BDGEFFDD2XWK7Y3BI6TT62PXNANCNFSM4P4XNYFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gopherbot commented 3 years ago

Change https://golang.org/cl/347390 mentions this issue: dashboard: remove noregabi builders

laboger commented 3 years ago

Based on the ppc64 FP docs I could find, it looks like every float loaded into a register is promoted, which is different from how both amd64 and arm64 work (AFAICT), which leaves single-precision floats in the lower parts of the register. If I recall correctly, the reflect code is written assuming that.

That doesn't seem very portable across targets, and inconsistent with the way Float values are currently handled in reflect. Right now there are explicit conversions between float32 and float64 through the Float() and setFloat() functions, not just a copy of half of the float64.

How does ppc64 deal with storage of 32-bit floats in that case? I assume there's some kind of instruction for the conversion into a GP register, or a conversion+store instruction?

There are instructions to load float doubles and singles. The lfs and stfs instructions will load and store 4 bytes assuming they are float32 then promote them to float64 when they are in the register.

I have made some changes to my patches to try to make it work by converting float32 to float64 before moving them to the Floats field in the regAbi.RegArgs structure. That change appears to pass the correct values for the input arguments but a similar change is needed for the return values which I haven't tried yet but it seems a bit more complicated.

cherrymui commented 3 years ago

Yeah, the difference in how 32-bit floats are handled on PPC64 is unfortunate. The spilling/unspilling code is generic, so it always operates on full register width. Yeah, for a FloatReg, we probably need conversions from/to a reflect.Value and always load/store 64-bit in RegArgs structure.

mknyszek commented 3 years ago

That doesn't seem very portable across targets, and inconsistent with the way Float values are currently handled in reflect. Right now there are explicit conversions between float32 and float64 through the Float() and setFloat() functions, not just a copy of half of the float64.

For the current implementation, I agree with what @cherrymui said.

Speaking more broadly, I don't know if there are very many clean cross-platform alternatives. In the end I think some set of platforms have to be the "different" one. Though, we're open to design suggestions! I'd love to make this simpler if possible...

This also brings to my attention that we're going to have a similar issue on riscv64. While riscv64 allows 32-bit floats to live in the bottom part of the register, it expects them to be NaN-boxed. So, we're going to have to make sure it's boxed before storage into a RegArgs. Maybe the right abstraction here is a platform-specific conversion hook for loading/storing floats from a RegArgs. If it would help, I can work on that.

CC @mengzhuo who mentioned looking into the riscv64 regabi.

I have made some changes to my patches to try to make it work by converting float32 to float64 before moving them to the Floats field in the regAbi.RegArgs structure. That change appears to pass the correct values for the input arguments but a similar change is needed for the return values which I haven't tried yet but it seems a bit more complicated.

I don't believe converting return values should be any harder: at that point call knows that what it's loading must always be a 64-bit float (since that's what they always are in the register), and if the target reflect.Value has type float32, then a conversion is necessary.

Unless you're talking about the reflect method value or MakeFunc cases, that may be more complicated and/or need something similar, unfortunately.

mknyszek commented 3 years ago

@laboger Please take a look at https://golang.org/cl/347566. I tried to abstract away the platform-specific operations need to be done to get the "in-register" representation into a RegArgs. Implementing an archFloat32ToReg and archFloat32FromReg function should be a lot more straightforward than having to add special cases to reflect.

Let me know if this is useful to you!

aclements commented 3 years ago

I think I haven't fully followed the discussion of floats. @mknyszek, @laboger, is this something that makes sense to call out in the internal ABI specification as one of the pieces that each architecture must "plug in" to the specification?

laboger commented 3 years ago

I think I haven't fully followed the discussion of floats. @mknyszek, @laboger, is this something that makes sense to call out in the internal ABI specification as one of the pieces that each architecture must "plug in" to the specification?

Right now in reflect/value.go, 4 bytes of a float32 are copied to the 8 byte FloatReg fields of the RegArgs structure and then later loaded as a float64. This does not seem portable and is inconsistent with other code in Golang where there are there are always explicit conversions between float32 and float64 and not just 4 byte copies. IMO if the float32 was explicitly converted to float64 and then copied into the FloatReg fields then this should be portable and work and doesn't need to be documented or called out specially.

@mknyszek I will try out your CL. That seems like what is needed.

mknyszek commented 3 years ago

Right now in reflect/value.go, 4 bytes of a float32 are copied to the 8 byte FloatReg fields of the RegArgs structure and then later loaded as a float64. This does not seem portable and is inconsistent with other code in Golang where there are there are always explicit conversions between float32 and float64 and not just 4 byte copies. IMO if the float32 was explicitly converted to float64 and then copied into the FloatReg fields then this should be portable and work and doesn't need to be documented or called out specially.

If you're suggesting to always convert (and feel free to ignore if you're not), I'm not sure the compiler+runtime can reasonably do that.

Today, the code on the other side of reflectcall for arm64 and amd64 expects that the float32 values are at the bottom 32 bits of the register, and subsequently use single-precision FP computation instructions. As I alluded to earlier, this means something has to end up being a special case, unless we change the compiler so all float32 computations in registers actually happen as 64-bit computations (maybe what you're saying?). I don't think that's going to happen, though, because there's an added cost on those platforms: promotion and demotion of the narrower-width float on each load and store.

@mknyszek I will try out your CL. That seems like what is needed.

Thanks! Let me know how it goes.

laboger commented 3 years ago

Today, the code on the other side of reflectcall for arm64 and amd64 expects that the float32 values are at the bottom 32 bits of the register, and subsequently use single-precision FP computation instructions. As I alluded to earlier, this means something has to end up being a special case, unless we change the compiler so all float32 computations in registers actually happen as 64-bit computations (maybe what you're saying?). I don't think that's going to happen, though, because there's an added cost on those platforms: promotion and demotion of the narrower-width float on each load and store.

Ah OK I didn't realize that the code on the other side is different for amd64 and arm64. When I saw the memmoves I assumed it was to avoid some kind of overhead.

I wrote this test to try and reproduce what I think is happening:


import (
        "fmt"
        "math"
        "unsafe"
)

type f32s struct {
        f1 float32
        f2 float32
}

type f64s struct {
        f1 float64
        f2 float64
}

func main() {

        var fb1 [8]byte
        var fb2 [8]byte
        var fb3 [8]byte

        f32p := (*float32)(unsafe.Pointer(&fb1))
        *f32p = float32(0.1234)
        copy(fb2[0:4], fb1[0:4])
        f64p := (*float64)(unsafe.Pointer(&fb2))

        fmt.Printf("### float32: %g %x copied to float64: %g %x as float32: %f converted to float64: %g %x as float32: %f ###\n", *f32p, math.Float32bits(*f32p), *f64p, math.Float64bits(*f64p), float32(*f64p), (float64(*f32p)), math.Float64bits(float64(*f32p)), float32(float64(*f32p)))
        copy(fb3[4:], fb1[0:4])
        f64p = (*float64)(unsafe.Pointer(&fb3))
        fmt.Printf("### float64: %g %x converted to float32: %f ###\n", *f64p, math.Float64bits(*f64p), float32(*f64p))

}

But I get the same result on and x86 and ppc64le:

x86: 
 ./test-float32new 
### float32: 0.1234 3dfcb924 copied to float64: 5.13814756e-315 3dfcb924 as float32: 0.000000 converted to float64: 0.1234000027179718 3fbf972480000000 as float32: 0.123400 ###
### float64: 4.1797765248929863e-10 3dfcb92400000000 converted to float32: 0.000000 ###
ppc64le:
 ./test-float32new 
### float32: 0.1234 3dfcb924 copied to float64: 5.13814756e-315 3dfcb924 as float32: 0.000000 converted to float64: 0.1234000027179718 3fbf972480000000 as float32: 0.123400 ###
### float64: 4.1797765248929863e-10 3dfcb92400000000 converted to float32: 0.000000 ###

cherrymui commented 3 years ago

@laboger I'm not sure what your code is trying to achieve.

Right now in reflect/value.go, 4 bytes of a float32 are copied to the 8 byte FloatReg fields of the RegArgs structure and then later loaded as a float64.

It is not loaded as a float64. On AMD64 and ARM64, it is loaded as 64 bits, just bits, without interpretation. It is up to the function body to interpret the bits. For a float32, we load 64 bits, but only 32 bits are live and used in the function body, as a float32, not float64.

laboger commented 3 years ago

It is not loaded as a float64. On AMD64 and ARM64, it is loaded as 64 bits, just bits, without interpretation. It is up to the function body to interpret the bits. For a float32, we load 64 bits, but only 32 bits are live and used in the function body, as a float32, not float64.

If reflection is not involved, passing float32s works fine.

The test that is failing is the reflect test for passStruct5. Here is the stack from gdb:

(gdb) bt
#0  reflect_test.passStruct5 (a=..., ~r0=...) at /home/boger/golang/abi.newer/go/src/reflect/abi_test.go:432
#1  0x0000000000077c8c in runtime.call64 () at /home/boger/golang/abi.newer/go/src/runtime/asm_ppc64x.s:480
#2  0x000000000007b76c in runtime.reflectcall (stackArgsType=<optimized out>, fn=<optimized out>, stackArgs=<optimized out>, 
    stackArgsSize=<optimized out>, stackRetOffset=<optimized out>, frameSize=<optimized out>, regArgs=<optimized out>)
    at <autogenerated>:1
#3  0x00000000000d29e8 in reflect.Value.call (v=..., 
    op=<error reading variable: access outside bounds of object referenced via synthetic pointer>, in=...)
    at /home/boger/golang/abi.newer/go/src/reflect/value.go:549
#4  0x00000000000d21ac in reflect.Value.Call (v=..., in=...) at /home/boger/golang/abi.newer/go/src/reflect/value.go:338
#5  0x000000000015b448 in reflect_test.TestReflectCallABI.func1 (t=0xc00028ed00)
    at /home/boger/golang/abi.newer/go/src/reflect/abi_test.go:182

In reflectcall, the float32 values have been copied into the Float fields of the regAbi structure as I've mentioned above. In call64, the argument registers are loaded from this regArgs structure using unspillRegs which loads all the floats as float64.

It's entirely possible I've got something wrong on this path so it doesn't work right, but this is my understanding of this problem.

cherrymui commented 3 years ago

@laboger I understand the problem with PPC64 and why it is failing. But I don't understand what the code in https://github.com/golang/go/issues/40724#issuecomment-914512805 is trying to do.

In call64, the argument registers are loaded from this regArgs structure using unspillRegs which loads all the floats as float64.

As I commented earlier, they are not loaded as float64. They are loaded as 64 bits. It is a bit-by-bit copying from memory to register. In your code in https://github.com/golang/go/issues/40724#issuecomment-914512805 you're trying to do a float64 load, *(*float64)(unsafe.Pointer(&fb2)), which is not reflectcall actually does.

Have you tried @mknyszek 's CL above? You'll need to define archFloat32ToReg and archFloat32FromReg with something like

func archFloat32FromReg(reg uint64) float32 {
    return float32(*(*float64)(unsafe.Pointer(&reg)))
}

func archFloat32ToReg(val float32) uint64 {
    x := float64(val)
    return *(*uint64)(unsafe.Pointer(&x))
}

and disable the generic version.

laboger commented 3 years ago

runtime.call64 calls runtime.unspillArgs which loads all the float32 arguments as float64 from the regArgs that the reflect code initialized before calling the reflected function. And the asm code for unspill looks similar to what's on arm64. I must be missing something obvious.

I have not had a chance to try the CL but will tomorrow.

randall77 commented 3 years ago

Not to throw an ugly grenade into this problem, but be sure to be careful about what happens to signaling NaNs. In Go, if you do float32(float64(s)), and s is a float32 signaling NaN, the result is a quiet NaN. So implementing float32 argument passing with an internal float64 cast risks dropping the signaling bit.

Check out reflect/value.go:cvtFloat and reflect/all_test.go:TestConvertNaNs for details.

I think we'd need a test that calls a function with a float32 argument using reflect, passing in a float32 signaling NaN, and making sure in the body of the function that the argument is still a signaling NaN.

gopherbot commented 3 years ago

Change https://golang.org/cl/348017 mentions this issue: reflect: add test for passing float32 signaling NaNs

mknyszek commented 3 years ago

Perhaps the fact that cvtFloat already exists in reflect is another good reason to move the hooks this CL provides to the reflect package...

gopherbot commented 3 years ago

Change https://golang.org/cl/347566 mentions this issue: reflect: add hooks for dealing with narrow width floats

laboger commented 3 years ago

It appears the CL does fix my problem but now I get the signalling NaN error:

./reflect.test 
--- FAIL: TestSignalingNaNReturn (0.00s)
    abi_test.go:904: signaling NaN not correct: 7fc00001
FAIL

cherrymui commented 3 years ago

What does the 32-bit FP load machine instruction do with signaling NaN? Does it convert to a 64-bit signaling NaN in register, or a quiet NaN? Does it preserve it when store back as 32-bit float?

If the machine instructions preserve signaling NaN, maybe we could write archFloat32FromReg and archFloat32ToReg in assembly. Something like

TEXT    ·archFloat32ToReg(SB),4,$0-16
    FMOVS   val+0(FP), F1
    FMOVD   F1, ret+8(FP)
    RET

randall77 commented 3 years ago

It looks like a signaling NaN should round-trip from memory->reg->memory ok. At least, there's a test for that, which I recently fixed.

laboger commented 3 years ago

What does the 32-bit FP load machine instruction do with signaling NaN? Does it convert to a 64-bit signaling NaN in register, or a quiet NaN? Does it preserve it when store back as 32-bit float?

It stays as a signaling NaN for loads and stores, but it looks like when storing a float64 in a register as a float32, it does an frsp to round it to a float32 and that changes it to a quiet NaN. So that one could be implemented in assembler to avoid the conversion.

laboger commented 3 years ago

I tried various assembler instructions to do this conversion but they all changed the NaNs from signaling to quiet. I found that this type of conversion is a common problem discussed in other Go issues, so just added Go code to the basic conversion to check if the quiet bit is not set in the float64 before converting, and if it then remove it from the converted float32.

func archFloat32FromReg(reg uint64) float32 {
        f64 := *(*float64)(unsafe.Pointer(&reg))
        if isSignalingNaN(f64) {
        // Convert float64 to float32 but if it is a signaling NaN don't convert to quiet.
                return math.Float32frombits(math.Float32bits(float32(*(*float64)(unsafe.Pointer(&reg)))) & f32snanbit)
        }
        return float32(f64)
}

This seems to work for reflect tests and pass the signaling NaN tests.

randall77 commented 3 years ago

You should also make sure the NaN payload doesn't get squashed. Or convert -0 -> +0. We really want the identity function on every 32-bit pattern.

cherrymui commented 3 years ago

@laboger I wonder why these don't work

TEXT    ·archFloat32ToReg(SB),4,$0-16
    FMOVS   x+0(FP), F1
    FMOVD   F1, ret+8(FP)
    RET

TEXT    ·archFloat32FromReg(SB),4,$0-12
    FMOVD   x+0(FP), F1
    FMOVS   F1, ret+8(FP)
    RET

Tried with a simple program and it does seem to preserve signaling NaN.

package main

import "math"
import "fmt"

const snan uint32 = 0x7f800001

func archFloat32ToReg(float32) uint64
func archFloat32FromReg(uint64) float32

func main() {
    x := math.Float32frombits(snan)
    r := archFloat32ToReg(x)
    y := archFloat32FromReg(r)
    fmt.Printf("%x %x\n", snan, math.Float32bits(y))
}

laboger commented 3 years ago

@cherrymui Yeah that does work. I had a concern about the float64->float32 conversion being done this way since the generated code for float32(float64) uses the frsp to round it down before storing it. I didn't know if there could be cases where this doesn't get the original float32 answer. But if the float64 is always from a float32->float64 conversion then it shouldn't need rounding.

On a side note, why is the float32 argument 8 bytes but the float32 result 4 bytes?

cherrymui commented 3 years ago

@laboger thanks. Good to know it works. Yeah, for the float32 case it doesn't need rounding because it is always from a float32.

why is the float32 argument 8 bytes but the float32 result 4 bytes?

A float32 is always 4 bytes. But we align to pointer size between arguments and results. So one is 8 + 4; the other is 4, aligned to 8, then add 8.

laboger commented 3 years ago

I made some changes to the patches to match the registers described in the PPC64 ABI doc section. I hit a few issues that could affect the doc.

I didn't notice this before but F0 is not in the assignable set of F registers in PPC64Ops.go so I'm just going to switch to start with F1. Otherwise I get errors at build time saying it can't assign an F reg. I'd rather not switch this in PPC64Ops.go since there are plenty of F regs anyway the last 4 are not even used now.
With the number of Int and Float regs that we decided on, I get split stack overflow errors in some of the race tests. I'm assuming this is due to the extra space allocated for spilling. I trimmed back the number of registers for arguments so there are 12 ints and 12 floats and that avoids the split stack overflow error. Any concerns if I switch the doc back to use 12 ints 12 floats and start the floats at F1? Or is there another way to avoid the split stack overflow.
I am getting a failure in fixedbugs/issue45344.go, same as what was in the original issue. I briefly looked at that but don't see why that would fail with my ABI patches but not without. The fix for the issue appears to be there.

thanm commented 3 years ago

@laboger one of the tools we used for bringup of the reg abi on amd64 was a function signature fuzzer. If you think it might be helpful to run it for ppc64 bringup, I can send pointers/instructions-- let me know.

cherrymui commented 3 years ago

@laboger

Yeah, starting at F1 SGTM. (It didn't use F0 probably because in the old backend we used to reserve F0 as a fixed value, probably 0.)
Reducing the number of registers to avoid stack overflow also SGTM. This is the main reason to limit the number, otherwise we could use something like 20 registers each.
The last one is harder to tell for now. Maybe when you have the CLs we could take a look. Thanks.

dr2chase commented 3 years ago

I'd also be a little careful about using too many registers, because our register allocator sometimes gets confused. There's a pretty serious performance regression for one benchmark on AMD64 caused by a return of a bunch of registers that caused it to do a bunch of new spills inside a loop. Longer term that is a bug we need to fix, in the short term, diminishing returns with more and more registers suggests just don't use every register that we "could", so as to avoid these other pathologies (for now, at least).

gopherbot commented 3 years ago

Change https://golang.org/cl/342869 mentions this issue: cmd/compile: document register-based ABI for ppc64

gopherbot commented 3 years ago

Change https://golang.org/cl/351110 mentions this issue: cmd/compile/internal: add ABI register information for ppc64

cherrymui commented 3 years ago

@laboger

For fixedbugs/issue45344.go, in cmd/compile/internal/ssa/gen/PPC64.rules

(Store {t} ptr val mem) && t.Size() == 8 && (is64BitInt(val.Type) || isPtr(val.Type)) => (MOVDstore ptr val mem)

I think we want to change the condition to !is64BitFloat, like all other architectures. Then it builds on PPC64 with GOEXPERIMENT=regabi.

laboger commented 3 years ago

The register ABI is now enabled for PPC64. https://go-review.googlesource.com/c/go/+/353969

ianlancetaylor commented 3 years ago

Nice work!

golang / go

cmd/compile: switch to a register-based calling convention for Go functions #40724