go-gl / gl

Go bindings for OpenGL (generated via glow)
MIT License
1.07k stars 74 forks source link

PtrOffset causes "cgo argument has Go pointer to Go pointer" runtime error #80

Open dmgard opened 7 years ago

dmgard commented 7 years ago

EDIT: using GODEBUG=cgocheck=0 can disable these checks but seems like kind of a hacky-workaround, especially if you need runtime pointer checks for other packages.

It seems that calling functions like gl.DrawElements using gl.PtrOffset to convert a gl buffer offset causes the Go runtime to intercept those offsets as if they were pointers into Go memory. This is just a toy example; I've run into this problem using large index/vertex buffers in other projects. Trying to index elements in the millions seems to semi-randomly throw errors.

Drilling down a bit:

// PtrOffset takes a pointer offset and returns a GL-compatible pointer.
// Useful for functions such as glVertexAttribPointer that take pointer
// parameters indicating an offset rather than an absolute memory address.
func PtrOffset(offset int) unsafe.Pointer {
    return unsafe.Pointer(uintptr(offset))
}

I'm already out of my depth here but I'm wondering why offsets have to be sent to OpenGL via CGO as unsafe.Pointer? Is there some obvious way around this error I'm missing?

Program to reproduce:

package main

import (
    "log"

    "github.com/go-gl/gl/v2.1/gl"
    "github.com/go-gl/glfw/v3.2/glfw"
)

// var pointers = make([]*int, 40*(1<<20))
var buffer = make([]byte, 48*(1<<20))

func main() {
    err := glfw.Init()

    if err != nil {
        log.Fatal(err)
    }

    window, err := glfw.CreateWindow(1024, 768, "Test", nil, nil)

    if err != nil {
        log.Fatal(err)
    }

    //for i := range pointers {
    //  pointers[i] = new(int)
    //}

    window.MakeContextCurrent()

    if err := gl.Init(); err != nil {
        log.Fatalln(err)
    }
    var vertBuf, indexBuf uint32

    gl.CreateBuffers(1, &vertBuf)
    gl.CreateBuffers(1, &indexBuf)

    gl.BindBuffer(gl.ARRAY_BUFFER, vertBuf)
    gl.BindBuffer(gl.ELEMENT_ARRAY_BUFFER, indexBuf)

    gl.BufferData(gl.ARRAY_BUFFER, len(buffer), gl.Ptr(buffer), gl.DYNAMIC_DRAW)
    gl.BufferData(gl.ELEMENT_ARRAY_BUFFER, len(buffer), gl.Ptr(buffer), gl.DYNAMIC_DRAW)

    // const NumDraws = 1000
    var NumDraws = len(buffer) / 4

    for i := 0; i < NumDraws; i++ {
        //offset := rand.Int() % 12000000 //+ 12000000
        offset := i

        //fmt.Printf("offset = %d", offset)

        gl.DrawElements(gl.TRIANGLES, 500000, gl.UNSIGNED_INT, gl.PtrOffset(offset))

        gl.DrawElementsInstancedARB(gl.TRIANGLES, 500000, gl.UNSIGNED_INT, gl.PtrOffset(offset), 1)
    }

    //for _, p := range pointers {
    //  *p++
    //}

    window.Destroy()
    glfw.Terminate()
}
errcw commented 7 years ago

As I understand it, the TL;DR is that Go does not allow non-pointer values in pointer types such as unsafe.Pointer. A small offset is very unlikely to be a valid pointer, hence the crash. The challenge is that the bindings are auto-generated based on the OpenGL XML specs, and these offsets are pointer parameters in the specs, hence are encoded as Go pointer types. Instead we should probably use uintptr as the Go type, which should in theory sidestep the issue, but that'd be a huge breaking change to the bindings. +@shurcooL to comment once he's back.

dmitshur commented 7 years ago

Breaking change is okay if it's neccessary/unavoidable, as the current code is unusable. But it's a matter of figuring out whether that's the best fix.

But this issue, isn't it the same/related to #71?

errcw commented 7 years ago

Yep, this issue is related to #71, but it's somewhat simpler in that we are not required to change the OpenGL typedefs to change the Go type bindings.

jeff-emanuel commented 6 years ago

I agree with shurcooL. The sooner a fix can be made, the better, even if it is breaking. I also haven't had any success with GODEBUG=cgocheck=0. Only by setting GOGC=off are crashes eliminated entirely.

errcw commented 6 years ago

Unfortunately our options here are limited by Go/cgo requirements. As I indicated before, Go does not allow non-pointer values in pointer types, so declaring a parameter that might take an offset as unsafe.Pointer is invalid. However we cannot simply change the parameter type to uintptr because that parameter might take a pointer, and it is unsafe to pass Go pointers to C using non-pointer types (the GC must be aware that the Go memory requires pinning). Indeed, per this thread on golang-nuts, there is no appropriate Go type.

So where does that leave us? To start, gl.PtrOffset must return uintptr; returning unsafe.Pointer is simply incorrect. As for the function parameter type, I see a couple options:

  1. Generate two functions, one that takes unsafe.Pointer for pointer values, and another that takes uintptr for offsets.
  2. Privilege the offset version over the pointer version and only generate the uintptr version.

Both options require identifying parameters that are declared as void* but take non-pointer offset values. I don't see any better way than a whitelist, unfortunately.

Right now I'm leaning towards option 2, but I'd like to hear additional input.

jeff-emanuel commented 6 years ago

Thanks for the update and for the work on this package.

I currently pass both pointers and offsets to DrawElements, but for the cases where I'm passing a pointer to indices, I could put the indices in a buffer and avoid passing pointers. A more complicated case is MultiDrawElements or MultiDrawElementsBaseVertex where the indices parameter type is "const GLvoid const indices", which can be a combination of true pointer and offsets. I'm passing the address of the first element of a list containing PtrOffsets. How would that work?

    n := len(startIndexOffsets)
    offsets := make([]unsafe.Pointer,n,n)
    for i:=0;i<n;i++ {
        offsets[i] = gl.PtrOffset(startIndexOffsets[i])
    }
    // Uses bound GL_ELEMENT_ARRAY_BUFFER buffer for indices.
    gl.MultiDrawElementsBaseVertex(
        primitive,
        &counts[0], gl.UNSIGNED_INT, &offsets[0], int32(len(counts)),
        &baseVertices[0])
errcw commented 6 years ago

Aha, MultiDrawElements is additionally problematic, at least in the case where the indices parameter might contain pointers to index data rather than offsets in a bound buffer. When passing Go memory to C, cgo requires that the Go memory contain no pointers to other Go memory (AIUI Go wants to avoid transitive pinning). The implication is that only C-allocated memory will work for this use case, which is a definite ease-of-use hiccup.

My takeaway is that simply assuming we can transform the API to use uintptr is infeasible, because folks are actively using these pointer-or-offset parameters to pass pointers, so we should instead generate additional WithOffset methods that swap unsafe.Pointer for uintptr. Additionally, we should clarify in the README how folks should handle pointer-to-pointer use cases.

jeff-emanuel commented 6 years ago

WithOffset methods would work for me. Thanks.

pwaller commented 5 years ago

Sounds like this thread has a potential resolution, adding WithOffset, it just needs someone willing to do the work to implement and test, correct?

dmgard commented 5 years ago

Works for me too.

pwaller commented 4 years ago

Another person hit this on https://github.com/go-gl/example/issues/72

Anyone able to cook up a fix? I'm a willing reviewer, just not implementer at this time.

dertseha commented 4 years ago

Hello there! I'm assuming that this is the central discussion thread for all the related issues popping up left and right.

The pull request go-gl/glow#107 is my proposal to add overload capabilities. This way glow could be taught to add overloads with different signatures.

To continue the discussion, the following is unclear to me:

I was able to do a minimal test with the three functions I used as a sample - though it was tedious;

MMulthaupt commented 4 years ago
  • Should there be a restrictive naming schema? (Everything with suffix WithOffset, or is a VertexAttribOffset also allowed?)

I reckon strictly using suffixes makes this change more accessible because there is a good chance users will see both functions in auto completion in their IDEs and thus be more likely to learn of the situation and how to deal with it without an extra round-trip through this project's issue list or Stackoverflow.

dertseha commented 4 years ago

Thank you for the response. Meanwhile I figured as well fixed suffixes are better for this, as perhaps a standard glVertexAttribOffset might surface in the future (which is more likely than glVertexAttribPointerWithOffset. Though I would still keep the free naming scheme in the overload XML files.

To be sure though, the overload of glGetVertexAttribPointerv should be called glGetVertexAttribPointerWithOffsetv - and not glGetVertexAttribPointervWithOffset ?

dertseha commented 4 years ago

The pull request was accepted for glow. Now to collect all the functions that need overloads. Does anyone have an idea which functions are affected? Would this be a group-effort and everyone contributes?

MMulthaupt commented 4 years ago

I just looked at package.go of v3.3-core which I currently use, and in there I find 823 occurrences of unsafe.Pointer. The vast majority of these is not going to be affected, since they do point to actual data in Go memory. In theory however, we do have to grind through all of these once, looking up whether the unsafe.Pointer parameter should really be a uintptr instead. Anyone who sees a fun afternoon in this? 💩

If we are going to miss a couple functions (as a compromise) we should definitely make a mention of the whole situation in the README.

neclepsio commented 4 years ago

I'm sure you could publish a new version with just the three overloads already defined by @dertseha. They are enough for me, and I think many other: all the checkptr bugs I saw in packages depending on go-gl/gl are for those three functions. You could also add glMultiDrawElements, that is identified above in this discussion. If other needs arise, you could manage them later.

neclepsio commented 4 years ago

In the meantime it gets fixed, you can use my fork.

dertseha commented 3 years ago

How can we get going in this matter? I would really like to use the new overrides I created a year ago, in order to close some old issues.

When I re-generate using glow, the package.go changes include some about the preprocessor directives, which also adds a #include <KHR/khrplatform.h> that wasn't present before. Trying to compile this fails as my system does not have this file.

Is there a canonical way to generate these files with glow?

errcw commented 3 years ago

Do you have an example PR/diff that demonstrates the issue (the include issue, that is)?

(Also: full disclosure, it's been a while since I've actively worked in this space, and as much as I've love to dive in and fix this ASAP my time is pretty constrained these days.)

dertseha commented 3 years ago

Here's the current section that is mainly affected, for v3.2-core/gl/package.go, starting at line 35:

...
// #ifndef GLAPI
// #define GLAPI extern
// #endif
// #include <stddef.h>
// #ifndef GLEXT_64_TYPES_DEFINED
// /* This code block is duplicated in glxext.h, so must be protected */
// #define GLEXT_64_TYPES_DEFINED
// /* Define int32_t, int64_t, and uint64_t types for UST/MSC */
// /* (as used in the GL_EXT_timer_query extension). */
// #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
// #include <inttypes.h>
// #elif defined(__sun__) || defined(__digital__)
// #include <inttypes.h>
// #if defined(__STDC__)
// #if defined(__arch64__) || defined(_LP64)
// typedef long int int64_t;
// typedef unsigned long int uint64_t;
// #else
// typedef long long int int64_t;
// typedef unsigned long long int uint64_t;
// #endif /* __arch64__ */
// #endif /* __STDC__ */
// #elif defined( __VMS ) || defined(__sgi)
// #include <inttypes.h>
// #elif defined(__SCO__) || defined(__USLC__)
// #include <stdint.h>
// #elif defined(__UNIXOS2__) || defined(__SOL64__)
// typedef long int int32_t;
// typedef long long int int64_t;
// typedef unsigned long long int uint64_t;
// #elif defined(_WIN32) && defined(__GNUC__)
// #include <stdint.h>
// #elif defined(_WIN32)
// typedef __int32 int32_t;
// typedef __int64 int64_t;
// typedef unsigned __int64 uint64_t;
// #else
// /* Fallback if nothing above works */
// #include <inttypes.h>
// #endif
// #endif
// typedef unsigned int GLenum;
// typedef unsigned char GLboolean;
// typedef unsigned int GLbitfield;
// typedef signed char GLbyte;
// typedef short GLshort;
// typedef int GLint;
// typedef unsigned char GLubyte;
// typedef unsigned short GLushort;
// typedef unsigned int GLuint;
// typedef int GLsizei;
// typedef float GLfloat;
// typedef double GLdouble;
// typedef void *GLeglImageOES;
// typedef char GLchar;
// typedef ptrdiff_t GLintptr;
// typedef ptrdiff_t GLsizeiptr;
// typedef int64_t GLint64;
// typedef uint64_t GLuint64;
// typedef int64_t GLint64EXT;
// typedef uint64_t GLuint64EXT;
// typedef uintptr_t GLsync;
// struct _cl_context;
// struct _cl_event;
...

When I run glow (both on MS Windows and Linux), this is the new result:

// #ifndef GLAPI
// #define GLAPI extern
// #endif
// #include <KHR/khrplatform.h>
// typedef unsigned int GLenum;
// typedef unsigned char GLboolean;
// typedef unsigned int GLbitfield;
// typedef khronos_int8_t GLbyte;
// typedef khronos_uint8_t GLubyte;
// typedef khronos_int16_t GLshort;
// typedef khronos_uint16_t GLushort;
// typedef int GLint;
// typedef unsigned int GLuint;
// typedef int GLsizei;
// typedef khronos_float_t GLfloat;
// typedef double GLdouble;
// typedef void *GLeglImageOES;
// typedef char GLchar;
// typedef khronos_intptr_t GLintptr;
// typedef khronos_ssize_t GLsizeiptr;
// typedef khronos_int64_t GLint64;
// typedef khronos_int64_t GLint64EXT;
// typedef khronos_uint64_t GLuint64;
// typedef khronos_uint64_t GLuint64EXT;
// typedef uintptr_t GLsync;
// struct _cl_context;
// struct _cl_event;

It seems all the core #ifdef logic has been moved to a KHR/khrplatform.h - though this file does not typically exist.

As per https://www.khronos.org/registry/implementers_guide.html#headers , this file centralizes such logic and is meant for "all" APIs, although "stored" in EGL repo. Main storage is here: https://github.com/KhronosGroup/EGL-Registry/blob/master/api/KHR/khrplatform.h

I understand under Linux it's possibly solved as in #106 , yet I can't find a way to solve this on MS Windows (google-fu fails me here). Is this file one that the loader-builder (glow) would have to add as well?

errcw commented 3 years ago

Thanks for showing that diff. It does indeed appear that historically the GL types were defined in a self-contained way (e.g., here GLintptr is ptrdiff_t, but in later revisions GLintptr became defined exclusively as khronos_intptr_t which requires khrplatform.h.

So, yes, it does seem like the fundamental problem to solve is how to make khrplatform.h available in the places it needs to be. Potentially on Windows we need to offer instructions to copy the file to the expected include path? Or to enable it to be inlined?

dertseha commented 3 years ago

Looking at my go-to loader-builder for C-based environments glad, there the khrplatform.h is also downloaded and added to the generated package. The discussion in this issue also notes that this header has become the de-facto standard for all APIs.

Seems like it's another problem to fix first. I'll have a look to extend glow to download and provide also this file when generating code.

dertseha commented 3 years ago

PRs for both glow and gl are up.

As noted in the PR and other comments, this is a start for the necessary "override" functions. My proposal is to still close this issue here when the PRs are merged and accept corresponding issues/PRs on demand for further functions. I doubt that this task will ever be complete, so there's no point in keeping this issue open forever.

Zyl9393 commented 1 year ago

I just learned this issue might be more severe than previously thought. To quote Ian Lance Taylor:

In Go an invalid pointer must never have a pointer type. Doing so will break the garbage collector.

This could explain some performance issues I have been having, where after running the application for some time, the garbage collector began taking longer and longer to execute even though no new objects were being created. (Upwards of 4 milliseconds) I'll follow-up here with some more occurrences of functions which need a WithOffset variant later.