cmd/cgo: don't use syscall.Errno type as errno return on Windows

Code-exE-Software commented 6 years ago

Apologies in advance If I'm explaining anything poorly.

What did you do?

I made an Go binding library for a static C lib, then wrote a go application to use that. During the course of developing that application, I managed to trigger the following error:

The process cannot access the file because another process has locked a portion of the file.

I have a stripped down testable chunk of code (complete with a more in depth readme) here: https://github.com/technicalviking/cgotest

I'd like to stress: I'm not looking to debug the code itself. I know the outputs are not what I expected (casting the C **double variable back to [][]float64 in 'extractOutputs' shows a significant number of NaN entries, but meh), but rather why the functionality puts the code in a state where a windows error populates the error return value of the go reference to the C function in this case.

EDIT TO INVESTIGATION

On further digging I found that calling sqrt on a negative number, even in a function defined in the preamble, is sufficient to trigger this behavior. The test code referenced in this issue has been updated accordingly.

What did you expect to see?

No errors

What did you see instead?

calling the c function using the mechanism doResponse, doError = C.bridgeIndicatorFunction(...) to leverage a C function pointer results in doError containing the following value: "The process cannot access the file because another process has locked a portion of the file."

System details

go version go1.9.2 windows/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=".exe"
GOHOSTARCH="amd64"
GOHOSTOS="windows"
GOOS="windows"
GOPATH="C:\Users\dmurker\Documents\Dev\DM\Go Projects"
GORACE=""
GOROOT="C:\Program Files Dev\Go"
GOTOOLDIR="C:\Program Files Dev\Go\pkg\tool\windows_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\D-DANI~1\AppData\Local\Temp\1\go-build223821631=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOROOT/bin/go version: go version go1.9.2 windows/amd64
GOROOT/bin/go tool compile -V: compile version go1.9.2
gdb --version: GNU gdb (GDB) 7.10.1

ianlancetaylor commented 6 years ago

I don't know what that error means. It's not coming from Go.

Code-exE-Software commented 6 years ago

The error message is from windows itself, and happens as the c function in the cgo preamble is returning to go, assuming that cgo function, or any other c function called therein, attempted to call sqrt on a negative number. the actual return value from the cgo function is not adversely affected, but the presence of this error message in the error return value of the go call to the cgo method is perplexing, and i dont understand how its getting there. This also means that, for now, im forced to merely log and ignore the error value when its not nil.

On Jan 17, 2018 6:06 PM, "Ian Lance Taylor" notifications@github.com wrote:

I don't know what that error means. It's not coming from Go.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23468#issuecomment-358513861, or mute the thread https://github.com/notifications/unsubscribe-auth/ABeZwhuRZg-ULhgLcuCDgTzehTDjE5Vfks5tLqcjgaJpZM4RiNcL .

Code-exE-Software commented 6 years ago

I've managed to get a MUCH simpler version of the code that demonstrates the problem.

https://github.com/technicalviking/cgotest2/blob/master/main.go

package main

/*
#include <math.h>
double getSqrt(const double test) {
    return sqrt(test);
}
*/
import (
    "C"
)
import "fmt"

func main() {

    for i := 10; i > -10; i-- {
        res, err := C.getSqrt(C.double(float64(i)))

        fmt.Printf("SQRT of %d is %f\n", i, res)

        if err != nil {
            //when i < 0, err contains the string:
            //"The process cannot access the file because another process has locked a portion of the file."
            fmt.Printf("Error for finding the sqrt of %d: %v\n", i, err)
        }
    }

}

slrz commented 6 years ago

Isn't this just the documented cgo behaviour of returning errno when calling a C function in multiple assignment context? If your C function does not set errno, don't expect to find anything meaningful there. It's likely just leftover garbage from whatever its last user happened to put there. Try setting errno to 0 before returning from your C function.

Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error (use _ to skip the result value if the function returns void). For example:
n, err = C.sqrt(-1)
_, err := C.voidFunc()
var n, err = C.sqrt(1)
https://golang.org/cmd/cgo/#hdr-Go_references_to_C

edit: yes it is. The Windows error ERROR_LOCK_VIOLATION is the integer 33, which I guess your libc is using for EDOM, which is what sqrt(3) returns on a domain error (i.e. calling it with a negative argument).

Code-exE-Software commented 6 years ago

That... tells me everything I need to know (as disappointing as it is...). The ansi c documentation I found does indeed indicate that the EDOM macro is also the integer 33.

However the error interface value being returned outputs as the string mentioned above (when using %v), not the actual integer errno itself. (Further digging reveals that using the %#v verb outputs 0x21, and using the %d verb outputs 33). Given the mismatch between the ansi c errno description, and the OS description for that error code, would it be reasonable to ask that using the %v verb to output the errno value give the ansi c macro name (or some other related value, such as the output of the %#v verb), rather than the host OS interpretation of that code?

slrz commented 6 years ago

The issue is that the Go side doesn't know the error code convention of the called C function. If you call a Windows API function through cgo, that same integer 33 actually means ERROR_LOCK_VIOLATION. You probably don't want the error returned from C.CreateFile to print as "numerical argument out of domain".

You can probably make a list of C standard library functions and special-case them (making the error message from sqrt look right), but there is lots of code outside of libc that uses C-style error conventions. It would just add even more confusion.

It's easier on Unix as those just incorporate the ANSI C errno definitions into the system interface, so this kind of ambiguity doesn't occur there.

Code-exE-Software commented 6 years ago

That seems like an even better reason to have the %v verb translate the errno to either the integer value (in this case 33) as output by the %d verb, or the hex value (in this case 0x21) as would be output by the %#v verb, instead of the host OS interpretation of that code. If the Go side doesn't know the error code convention of the called C function, then the 'default format' (as described in the documentation of the fmt package for the %v verb) of the error code should be unambiguous as well. I would submit that translating the errno to a useable string should be an exercise for the programmer with a priori knowledge of the C function being called, not an assumption made by Go.

Code-exE-Software commented 6 years ago

given the investigative work done here, would it be better to close this and open a new issue related to the default output (%v verb) of the cgo error, or should I edit the title and description of this issue? Or is this discussion chain sufficient?

zhangyoufu commented 6 years ago

Maybe we need a separate C.Errno that follows the meaning of errno in C language instead of using syscall.Errno for all platforms.

Code-exE-Software commented 6 years ago

@zhangyoufu IMHO the issue is more concerning the difference in functionality between

https://golang.org/src/syscall/syscall_windows.go?#L90 and https://golang.org/src/syscall/syscall_unix.go?#L103

Converting the errno to a syscall.Errno makes sense, as far as it goes, but the obfuscation of what happens between operating systems when trying to output the syscall.Errno using the %v verb (which I think calls the .Error() method to get the string value if it's available) feels like there's an assumption that C code used by a developer on windows will always be code accessing the Windows API. Silly me, I stumbled across a bug in the C code I was writing bindings for which had no such goal, and ran face first into a (potentially uncommon) case where this assumption wasn't valid. Part of this is admittedly my lack of experience with the language (I'd been using it for only a paltry few months at the time).

Potential fixes: 1) Bring the two syscall.Errno.Error() implementations more in line each other, deferring to the unix implementation on a best effort basis (33 EDOM in this case), and leave the translation to a WinAPI error as an exercise for the coder. 2) Modify the Windows version of syscall.Errno.Error() to output both the ANSI-C error code (33 EDOM in this case), as well as the Windows specific implementation if it should apply, and leave differentiating the actual meaning of the error as an exercise for the user under the assumption that they are cognizant of the C code the coder is writing CGO code for. 3) Make zero assumptions about the operating system, and simply output the strconv.ITOA value of the underlying uint, leaving ALL interpretation of the code as an exercise for the coder.

Any of these potential solutions (or anythinig like them) would preclude the need for special interpretation of the %v verb when outputting a syscall.Errno with the fmt package.

@ianlancetaylor thoughts?

ianlancetaylor commented 6 years ago

If I'm following this correctly, the problem is that C functions on Windows set errno to a value that is unrelated to WIndows error codes. The syscall.Errno type is used for Windows error codes, as printed by the Win32 FormatMessageW function. It follows that it's wrong for cgo to convert the C errno value to the type syscall.Errno. We should change cgo to do something better. Perhaps it would be possible to a define a new type that implements the Error method using strerror, but it would also be OK to use a type that implements Error via strconv.Itoa.

Once we have that type, @zhangyoufu already pointed out the line in cgo that needs to change on Windows.

Code-exE-Software commented 6 years ago

@ianlancetaylor That is a much better explanation, yes.

Code-exE-Software commented 6 years ago

@ianlancetaylor Kinda sad this got pulled from go 1.11. Is this issue going to be resolved before 1.12 or 2.0?

ianlancetaylor commented 6 years ago

Someone needs to actually work on it. Want to send a change?

golang / go