ebitengine / purego

Apache License 2.0
2.16k stars 68 forks source link

replace cgo with purego leader to wired problem. #158

Closed calvin2021y closed 6 months ago

calvin2021y commented 1 year ago

In this project we use purego replace cgo last week, the build speed is improved.

From the product env, we get some wired issue look like some array value changed. (they are used as schema of data structure), look like race error.

revert purego to cgo seems fix the problem.

we use unsafe pointer to pass array pointer into c world and return from cgo. (no copy or keep the pointer), look like this:

func testDecode(id int64, seed uint64, device uint64, flags uint32, size int32, req []byte) (err int32, resp []byte) {
    resp = make([]byte, size+32)
    var p1 = C.int32_t(len(req))
    var p2 = C.int32_t(len(resp))
    err = int32(C.testDecode(C.int64_t(id), C.uint64_t(seed), C.uint64_t(device), C.uint32_t(flags), unsafe.Pointer(&req[0]), p1, unsafe.Pointer(&resp[0]), p2, C.int32_t(1)))
    if err > 0 {
        resp = resp[0:err]
        err = 0
    } else {
        resp = nil
    }
    return
}

the purego version:

var cDecode func(id int64, seed uint64, device uint64, flags uint32, ib unsafe.Pointer, il int32, ob unsafe.Pointer, ol int32, op int32) int32

func testDecode(id int64, seed uint64, device uint64, flags uint32, size int32, req []byte) (err int32, resp []byte) {
    resp = make([]byte, size+32)
    var p1 = int32(len(req))
    var p2 = int32(len(resp))
    err = cDecode(id, seed, device, flags, unsafe.Pointer(&req[0]), p1, unsafe.Pointer(&resp[0]), p2, 1)
    if err > 0 {
        resp = resp[0:err]
        err = 0
    } else {
        resp = nil
    }
    return
}

there is other code look like this, our binary is around 70 MB. go version is go1.20.7 linux/amd64 run at alpine.

calvin2021y commented 1 year ago

we use purego/v0.4.0

TotallyGamerJet commented 1 year ago

At first glance from what you've shown there doesn't appear anything wrong. However, I'd suggest using *byte instead of unsafe.Pointer for clarity although that shouldn't effect if it works or not.

I'm going to need some more information:

Ofc if you can provided a minimal reproducible case that would be the most advantageous

calvin2021y commented 1 year ago

Thanks for the quick reply.

On the purego version does the error occur even when CGO_ENABLED=1?

we are not test with this $CGO_ENABLED, the error is hard to catch. (every day we had 0.1 billion request, app need alive 2~3 days to start show this error)

How are you setting the value of testDecode? RegisterFunc?

purego.RegisterLibFunc

what output are you getting and what did you expect to see? You mention the array is changing but how? Like is it a different pointer or something?

the data corruption not direct related into purego/cgo code. I get error like go database driver scan type not match. (the scan type store in array inited at init, so I guess it corrupted generate wrong bind)

Does the C function your calling expect the memory to be allocated in C memory?

No, the c function only change the context of the memory block. not care where it come from, not do alloc from c function

Does the issue still appear when using v0.5.0-alpha.1?

we dont had a env to reproduce this error, I will try more on my local test suit.

TotallyGamerJet commented 1 year ago

If you can't find a reproducible case at least test if CGO_ENABLED=1 with purego still causes issues. If that fixes it then the problem is in fakecgo if it doesn't go away then the problem is in purego package.

eliottness commented 1 year ago

I happened to get somewhat of the same issue at one point: a corruption error appearing at a very very narrow rate. It turned out that some of the pointers and make calls I was doing were allocated on the stack. At some point the stack was moved elsewhere by the go runtime, causing the corruption. Did you check all your allocations using go build -gcflags="-m" ? Otherwise, good luck with that 😢. I managed to reproduce it at a fair rate locally using the -count=9999... option on go test on my side.

TotallyGamerJet commented 1 year ago

Hmm I'm surprised you were able to create stack allocations. That definitely could be the issue here. Do you have an example? I believe purego should be forcing allocations onto the heap if you're using RegisterFunc

TotallyGamerJet commented 9 months ago

@calvin2021y does this still happen with the latest commit?

calvin2021y commented 7 months ago

I am not able to test this, because I can not catch this error from test env(it take time and resource to duplicate). and I am not able to switch production code into this.

I will update if I get chance to try this in future.

TotallyGamerJet commented 6 months ago

Closing as this is currently not reproducible. If anything changes please feel free to update