go-gl / gl

Go bindings for OpenGL (generated via glow)
MIT License
1.07k stars 74 forks source link

gl.FenceSync returns invalid unsafe.Pointer, causing crash #71

Closed dominikh closed 5 years ago

dominikh commented 7 years ago

gl.FenceSync returns the created fence as an unsafe.Pointer. However, values returned by glFenceSync aren't necessarily valid pointers to memory owned by the driver.

On my system (nvidia 378.13 on Linux), for example, glFenceSync returns increasing numbers, starting at 1. The issue with that is that the Go runtime expects pointers in unsafe.Pointer to be valid, either pointing to memory owned by the Go memory allocator, or valid, non-Go memory. Pointers such as "1" are neither, which causes runtime.writebarrierptr to abort the program with a "bad pointer in write barrier" error. Similarly, the driver might return a value that looks like a valid pointer to Go memory, preventing said Go memory from being garbage collected as long as the unsafe.Pointer is alive.

To simulate the issue, compile https://play.golang.org/p/614oIx_Y1D and run it with GOGC=0 (to force frequent GCs, triggering the problem faster.). You should see the following:

$ GOGC=0 ./foo
runtime: writebarrierptr *0x4fa230 = 0x1
fatal error: bad pointer in write barrier

runtime stack:
runtime.throw(0x4a81ca, 0x1c)
    /usr/lib/go/src/runtime/panic.go:596 +0x95
runtime.writebarrierptr.func1()
    /usr/lib/go/src/runtime/mbarrier.go:208 +0xbd
runtime.systemstack(0x4fab00)
    /usr/lib/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
    /usr/lib/go/src/runtime/proc.go:1132

goroutine 1 [running]:
runtime.systemstack_switch()
    /usr/lib/go/src/runtime/asm_amd64.s:281 fp=0xc420048f00 sp=0xc420048ef8
runtime.writebarrierptr(0x4fa230, 0x1)
    /usr/lib/go/src/runtime/mbarrier.go:209 +0x96 fp=0xc420048f38 sp=0xc420048f00
main.main()
    /tmp/foo.go:15 +0x9b fp=0xc420048f88 sp=0xc420048f38
runtime.main()
    /usr/lib/go/src/runtime/proc.go:185 +0x20a fp=0xc420048fe0 sp=0xc420048f88
runtime.goexit()
    /usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420048fe8 sp=0xc420048fe0

I would recommend returning a uintptr instead (and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync).

dmitshur commented 7 years ago

Thanks for reporting. This sounds somewhat similar to #31 that we've had to resolve for Go 1.6. Except I'm guessing that gl.FenceSync is much more rarely used, and it's quite possible you're the first person who's tried to use it and ran into this.

I would recommend returning a uintptr instead (and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync).

I need to refresh my memory on this (if you happen to have the relevant doc link handy, please share, otherwise I'll have to look for it), but that's likely the right fix. Thanks. Do you want to send a PR (to go-gl/glow, the generator that generates these bindings)?

dmitshur commented 7 years ago

and, similarly, accepting a uintptr in gl.ClientWaitSync and gl.WaitSync

gl.IsSync too. I think it's everything that has GLsync type in the signature.

dominikh commented 7 years ago

It's not quite the same as #31, but does fall into the category of not playing loose with pointer types.

Regarding documentation: https://www.khronos.org/opengl/wiki/Sync_Object and the Go spec would be relevant here. The former specifies that GLsync is an opaque type typedef struct __GLsync *GLsync, ergo a pointer. And uintptr is defined as an unsigned integer large enough to store the uninterpreted bits of a pointer value.

Good point re gl.IsSync – I haven't looked up the whole list of GLsync-related functions.

Do you want to send a PR (to go-gl/glow, the generator that generates these bindings)?

I can do that if my proposed solution is accepted.

errcw commented 7 years ago

Right, it should be straightforward to update type.go in glow to change the Go type of GLsync, and I agree that's the right path forward. It's unfortunate it's not a backwards-compatible change, but given that it's required, I don't think there is a better way.

dmitshur commented 7 years ago

Reopening because we still need to regenerate these bindings with the latest glow. (See https://github.com/go-gl/gl#generating and #54 for an example of a regenerate PR.)

errcw commented 7 years ago

It seems that the straightforward fix to change the generated type to unitptr failed (see PR #72). I'll copy my commentary there to here.

Go is unhappy converting between uintptr and a C pointer type. Which is a little strange because we're already doing the same thing for GLhandleARB which maps to void * on Mac OS. Maybe void * is special compared to an anonymous struct pointer?

It might be safe to do a just-in-time conversion from uintptr to unsafe.Pointer (but the GC may still catch this at exactly the wrong moment)? Alternatively we could change the C function signature and cast the value there, though that's not yet well supported by glow.

dominikh commented 7 years ago

I can't say for sure if that conversion would be safe or not. I'd like to err on the side of "not safe" if arguments are placed on the stack before the call.

I'm currently checking if there is another way around this that doesn't require extending glow.

dominikh commented 7 years ago

@errcw So… void* is special because cgo represents void* as unsafe.Pointer – and unsafe.Pointer can be converted to uintptr. All other pointer types are represented in a type-safe manner in Go, and Go will not let you convert a typed pointer to uintptr, unless you go through unsafe.Pointer first. Otherwise, Go's type system wouldn't be safe.

Doing x = uintptr(unsafe.Pointer(fence)) would be safe, however I don't believe the inverse conversion to be safe in the context of a function call.

I don't see any other solution than the one proposed by you, which adds C shims to do the conversion for us. One idea that I had was a type GLSync C.GLsync, but unfortunately Go does the "right" thing, treats it as a pointer, and crashes just like with an unsafe.Pointer.

errcw commented 7 years ago

So, in theory, we can mangle the C parameter types used for both the GL function typedefs and the corresponding cgo entry points, because we simply need the types to align throughout the generated code and be compatible with the function pointers, rather than being exactly faithful to the GL definitions.

In type.go we could hack CType to return a different value for GLsync? (I'd try it myself but I'm not on a computer where I can compile anything... Otherwise I can try later this week.)

dominikh commented 7 years ago

Which type would you use instead? We can't use void*, because if our fence ID by chance looked like a Go pointer to another Go pointer, cgo wouldn't allow us to make the call (see https://play.golang.org/p/oRap4q_Nw9 for an example). uintptr_t exists, but as far as I understand C, that type has to be at least large enough to hold any pointer. It may be larger.

errcw commented 7 years ago

In a cursory search I failed to find conclusive evidence whether uintptr_t could be wider than a pointer. I suspect it's true in theory but false in practice, which might be sufficient for us to choose it as a type.

Another alternative we could consider is to change the GLsync C typedef from an anonymous struct to void* which should allow the cast.

dominikh commented 7 years ago

Like I said in my previous message, using void* wouldn't be of any help. That would still require going through unsafe.Pointer and be treated like an actual pointer, with all the checks and requirements that come with that. It's not a suitable type for arbitrary numbers.

errcw commented 7 years ago

I'm not sure we'd have to route through unsafe.Pointer? Today GLhandleARB is typedef void* on Apple platforms, and we use unitptr as the corresponding Go type. See, e.g., CompileShaderARB, where we happily convert (C.GLhandleARB)(shaderObj). Unless I misunderstand and you're suggesting that the cgo compiler understands the conversion is actually for a pointer type, and will catch this transgression at runtime?

dmitshur commented 7 years ago

@errcw, was this issue closed intentionally?

errcw commented 7 years ago

Alas, no, I was just resyncing my fork to try to work on it.

dominikh commented 7 years ago

@errcw What I am referring to is that a C function void fn(void *arg) will have the signature C.fn(arg unsafe.Pointer) in Go, i.e. the following code doesn't compile:

package main

/*
#include <stdio.h>

void dhfoo(void* arg) {
  printf("%p", arg);
}
*/
import "C"

func main() {
    x := uintptr(1234)
    C.dhfoo(x)
}

because of

/tmp/foo.go:14: cannot use x (type uintptr) as type unsafe.Pointer in argument to func literal

Of course you can convert a uintptr to an unsafe.Pointer, and that's what (C.GLhandleARB)(shaderObj) is doing, but that brings you back to square one:

package main

/*
typedef void *Magic;
*/
import "C"
import "fmt"

var X C.Magic

func main() {
    X = C.Magic(uintptr(1))
    for {
        fmt.Println(X)
    }
}

results in

$ GOGC=0 ./foo
runtime: writebarrierptr *0x6fb380 = 0x1
fatal error: bad pointer in write barrier

and

_cgo_gotypes.go:type _Ctype_Magic unsafe.Pointer
errcw commented 7 years ago

All right, I have a proposed workaround in glow #85. It relies on uintptr_t having the same width as struct __GLsync * but based on my research (and StackOverflow) it appears this is a reasonable assumption to make in practice. It also relies on uintptr_t having a definition, but the OpenGL typedefs already rely on ptrdiff_t so it should be reasonable to also rely on uintptr_t.

I verified it works as intended on macOS but that's the only platform I currently have available to build and test on. If somebody else is interested in verifying on another platform that'd be great.

dominikh commented 7 years ago

I should be able to give the patch a test later this week.

dmitshur commented 6 years ago

Friendly ping @dominikh.

dominikh commented 6 years ago

I've finally gotten around to testing the change. It compiles, doesn't cause runtime panics, and I've successfully used a fence.

errcw commented 6 years ago

Thanks! I've merged in my fix. Ideally we now regenerate go-gl/gl with the updated code.

dominikh commented 6 years ago

@errcw it might be worth mentioning that when I regenerated the 4.5-core bindings with glow, some function signatures changed from accepting int32 to accepting int. I don't know if that's expected or not, but it'd break consumers of the API.

errcw commented 6 years ago

Alas. Well, because of https://github.com/go-gl/gl/issues/80, I'm expecting a more broadly breaking API change will be necessary, so hopefully we can break everyone only once.

dmitshur commented 6 years ago

We've regenerated with latest glow and merged #93. I believe this should be resolved. Can you please confirm?

dmitshur commented 5 years ago

Friendly ping @dominikh. You said on February 15:

I've finally gotten around to testing the change. It compiles, doesn't cause runtime panics, and I've successfully used a fence.

If that's still the case, I think we can close this issue as resolved (via PR https://github.com/go-gl/glow/pull/85 to glow, and PR https://github.com/go-gl/gl/pull/93 to regenerate gl).

dominikh commented 5 years ago

Like I said, it's working :-)