golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.09k stars 17.55k forks source link

cmd/compile: it is not possible to prevent FMA with complex values #36971

Open kortschak opened 4 years ago

kortschak commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.13.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="arm64"
GOBIN="/home/user/bin"
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build515689865=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Examine the assembly generated for the code at https://play.golang.org/p/JuTC-BPAIJN with GOARCH=arm64

What did you expect to see?

No FMA/FMS instructions emitted.

What did you see instead?

    0x00ac 00172 (/home/user/c.go:19)   PCDATA  ZR, $5
    0x00ac 00172 (/home/user/c.go:19)   FMOVD   8(R8), F4
    0x00b0 00176 (/home/user/c.go:19)   FMSUBD  F1, F3, F4, F3
    0x00b4 00180 (/home/user/c.go:19)   SCVTFD  R2, F5
    0x00b8 00184 (/home/user/c.go:19)   FADDD   F5, F3, F3
    0x00bc 00188 (/home/user/c.go:19)   FMOVD   F3, (R0)(R6)
    0x00c0 00192 (/home/user/c.go:19)   FMULD   F4, F0, F0
    0x00c4 00196 (/home/user/c.go:19)   FMADDD  F1, F0, F2, F0
    0x00c8 00200 (/home/user/c.go:19)   FMOVD   ZR, F1
    0x00cc 00204 (/home/user/c.go:19)   FADDD   F0, F1, F0

No amount of wrapping the operands in complex128 prevents this AFAICS.

agnivade commented 4 years ago

Does setting "GODEBUG=cpu.fma=off" help ?

kortschak commented 4 years ago

The output remains identical with that env var setting.

agnivade commented 4 years ago

I think that's a runtime setting. Maybe @martisch can help.

randall77 commented 4 years ago

I think you're right that there's no way to turn this off for arm64. @agnivade, the GODEBUG setting should only be for x86 (and x86 only uses fma instructions for math.FMA).

There's nothing in the spec that specifies how complex arithmetic is calculated, so I don't think we're breaking anything.

I'm curious as to what is breaking as a result of this. It's not like you can control the rounding in a complex multiply regardless (there are 3 roundings in a non-fma complex multiply, and just 2 roundings in a fma-enabled complex multiply).

smasher164 commented 4 years ago

@agnivade cpu.fma is for runtime feature detection that is necessary for math.FMA. In this issue, the explicit cast described in https://github.com/golang/go/issues/17895 to prevent the optimizer from automatically folding xy+z into fma(x,yz), does not work for the complex128 expression `complex128(ev[k]w[k]) + complex(float64(i), 0)`.

Edit: accidentally restated @randall77's comment

kortschak commented 4 years ago

I'm in the process of attempting to get gonum's lapack implementation working on arm64. So far, the failures have been due to FMA/FMS. I have gone through the code to attempt to remove the autogeneration of FMA and FMS to get to a point where I can debug the failures. I'm instead getting to the point where I am not going to bother trying to support arm64.

agnivade commented 4 years ago

I see. Thanks for the clarification Keith and Akhil.

josharian commented 4 years ago

I don't think we're breaking anything.

We might be breaking golden tests that attempt to get the same output (perhaps within some tolerance) across different architectures.

I have gone through the code to attempt to remove the autogeneration of FMA and FMS to get to a point where I can debug the failures.

It'd be pretty straightforward to add a compiler command line flag to disable FMA. And to add a flag that logs where FMA is inserted. This would be in keeping with many other compiler flags. Would that help?

(I still think it is also worth having a conversation about whether using explicit rounding should prevent FMA on a case-by-case basis with complex values.)

kortschak commented 4 years ago

We might be breaking golden tests that attempt to get the same output (perhaps within some tolerance) across different architectures.

I suspect that it's worse than that. We already work around this kind of thing by providing sets of acceptable golden values and arch-dependent golden values, but we are working on problems where the intrinsic behaviour of routines as a whole appear to be broken by automatic FMA/FMS emission (I can't guarantee that this is true, because I can't prevent the compiler from emitting all FMA/FMS instructions - hence this issue). The case that brought this up is Eigen decomposition, which is an inherently numerically unstable operation in some cases, I think that the differential precision by fused v non-fused operations is causing amplification of the instability and so the complete (i.e. not just a matter of increasing tolerance) failure on arm64.

kortschak commented 4 years ago

With help from @josharian I've temporarily made these changes and I think my hypothesis is incorrect.

I do still think though that users should be able to prevent fused operations with complex values.

josharian commented 4 years ago

This case was noted in passing at https://github.com/golang/go/issues/17895#issuecomment-280390934 and the following comment.

randall77 commented 4 years ago

Try computing x*y (as complex128s) using:

func cmul(x, y complex128) complex128 {
    return complex(float64(real(x)*real(y))-float64(imag(x)*imag(y)), float64(real(x)*imag(y))+float64(imag(x)*real(y)))
}

That should inhibit fma computation of floating-point multiply.

kortschak commented 4 years ago

Yes, I'm aware of that. At that point I should just use struct{ re, im float64 }.

martisch commented 4 years ago

I think that's a runtime setting. Maybe @martisch can help.

As @randall77 pointed out FMA setting doesnt exist for arm64 as we assume its always present: https://github.com/golang/go/blob/03ef105daeff4fef1fd66dbffb8e17d1f779b9ea/src/internal/cpu/cpu_arm64.go#L44

There is no condition for arm64 in the compiler currently to not generate FMA or make it conditional on the GODEBUG setting at runtime: https://github.com/golang/go/blob/a50c3ffbd47e3dcfc1b5bd2a2d19d55731481eaa/src/cmd/compile/internal/gc/ssa.go#L3574

kortschak commented 4 years ago

The original issues that prompted this have been resolved, in part by changing the compiler to not emit fused operation instructions. The original issues covered the range of differences in precision causing changes in output precision, differences in precision changing gross behaviour and an actual real bug in our code (~50 issues were rolled up in the original problem).

Because of the complexity of the issues we had (and the number of sites where fused operations can be constructed), it would have been very helpful to be able to tell the compiler to not emit fused operation instructions in order to exclude that as a cause. I'll note that discordance between precision at different sites has historically been a cause of catastrophic failures (I hope no-one is using Go for missile guidance).

matthinrichsen-wf commented 2 years ago

I found my way here via https://github.com/golang/go/issues/53297. In this case, FMA also provided us with incorrect values (with float64 instead of complex128).

I was able to rework the statement to not cause the compiler to optimize it into a FMA, but it would have been much better to have a compiler option to disable it entirely.