cmd/cgo: specify rules for passing pointers between Go and C

ianlancetaylor commented 9 years ago

Issues #12116 and #8310 discuss the problem of when it is OK to pass pointers between Go and C code when using cgo. This proposal is a specific set of rules for when it is OK, plus some modifications to cgo to partially enforce the rules.

Definitions: a Go pointer is a pointer to memory allocated by the Go runtime. A C pointer is a pointer to memory not allocated by the Go runtime, such as by C.malloc.

The proposed rules are:

Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers. That rule must be preserved during C execution, in that the program must not store any Go pointers into that memory.
C code may otherwise use the Go pointer and the Go memory to which it points freely during the call. However, C code may not keep a copy of the Go pointer after the call returns.
If Go code passes a Go pointer to a C function, the C function must return. While there are no documented time limits, a C function that simply blocks holding a Go pointer while other goroutines are running may eventually cause the program to run out of memory and fail.
A Go function called by C code may not return a Go pointer. A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer into memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.

The proposal is that we declare that all Go programs that adhere to those rules are, and will remain, valid, under the Go 1 compatibility guarantee. Go programs that do not adhere to these rules may break today or in future releases.

We further propose some modifications to cgo that will help enforce these rules. The cgo changes are not the rules; the rules are as above. The cgo changes are intended to make it easier to detect, at build time, programs that violate the rules.

We modify cgo to prohibit any type conversion from a Go pointer type to a C pointer type, except as a direct argument to a C function call. Any such type conversion is an error in cgo. Even within a direct function call, we prohibit type conversions if the Go pointer type points to a type that itself contains pointers.
A function labelled //export, which may be called from C code, must not return any Go pointer type and must not have any parameters that are Go pointer types.

These cgo rules are more strict than the rules about passing pointers. They will prohibit cases that are permitted. It is also possible to work around them, even without using unsafe, and write code that is not blocked by cgo but that violates the above rules. I will provide examples in a separate design doc.

swetland commented 9 years ago

This addresses pretty much all of my concerns about being able to build plumbing from Go to native libraries or syscalls. Rule #3 is potentially a little problematic (imagine calling into a USB read syscall for a device that simply does not return data for a loooong time), but can be worked around (periodic timeout and retry, two part operation w/ select + read, etc), without too much suffering.

griesemer commented 9 years ago

Presumably this requires either a) a GC that doesn't move objects; or b) a form or (automatic, via CGO) pinning of Go pointers passed to C.

dr2chase commented 9 years ago

@swetland - I've harassed Ian enough on this issue already, but the workaround for Rule number 3 is to allocate memory with malloc, and let that pointer leak into the Go code instead of a Go pointer leaking into C. The GC will tolerate it, does not need to worry about collecting it, and you can use a finalizer on the Go object encapsulating the state of the USB device to ensure that if the object is not freed by close, then it is freed by the finalizer. This is the idiom I've used in the past, and it's the one I recommend until you have performance problems that would justify trying something else, and my first answer for the first round of performance problems is "we need to improve the inliner" so that encapsulation of the malloc'd buffer is a zero-cost abstraction.

mdempsky commented 9 years ago

@ianlancetaylor For the first rule, can you clarify what "Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers." means and perhaps some further rationale behind this restriction? I understand why C code mustn't mutate Go pointers, but there seems to be a significant gap between those two restrictions.

For example, in the code below, is this rule satisfied or violated by passing a pointer to a pointer-free subregion of a Go struct? If it's violated, do you propose the toolchain should reject this still? If so, how?

// #include <string.h>
import "C"

func fill(x []byte) {
    C.memset(&x[0], 'x', len(x))
}

type N struct {
    Next *N
    Buf [1024]byte
}

func fillN(n *N) {
    fill(n.Buf[:])
}

Also, I assume by "Go pointers" you actually mean to include all pointer-like types (i.e., strings, slices, functions, interfaces, maps, channels)?

minux commented 9 years ago

I think your fill example satisfies the rules.

The rule seems to forbid passing a *N to C function, but I agree the rule need more clarification as to what constitute a Go pointer. For example, what if the N struct is declared as:

type N struct { Next unsafe.Pointer Buf [1024]byte }

and the programmer guarantee that Next pointer will be pointing to C.malloc'ed memory?

akavel commented 9 years ago

Thanks for starting the work! That said, some aspects are still not very clear to me -- I'd be very grateful for some clarifications. I'm especially interested from the point of view of the "Windows world", i.e. interacting with WinAPI "syscalls" & some other custom DLLs:

Is this understood to cover the syscall package use cases too? I.e. when somebody does import "syscall", but does not import "C". Please state this explicitly, whether it is covered or not, and to what extent. I'm aware that the underlying mechanisms are the same to some extent (or at least were a few versions ago), but I'm especially not sure if that's assumed "official" or "accidental". I believe it isn't really stated anywhere in the official docs (or is it?). This proposal could be a very good place to clarify this, and I'd be really grateful for that.
(Related to 1.) Specifically, the WinAPI syscalls often take uintptr as an argument. In +/-Go 1.3 it was (AFAIU) semi-officially named as a "non-pointer" (with regards to GC). Am I right to undertand this then falls outside the current wording of the discussed proposal? And thus, back into uncharted territory...? :( Or is uintptr now blessed into pointers family?
(Related to 2.) If uintptr is still not-a-pointer, is there a chance that any rules for "pinning" a pointer could become part of this proposal? Such that they'd enable taking a uintptr of a pointer in some "officially safe" way (hopefully "in Go 1.x where x>=3") and passing it into syscall/C world for some (finite?) time?
- (I'd be specially grateful if the rules could be stated explicitly also "back in time", starting with Go 1.3, as I believe this knowledge is still not really codified anywhere oficially, and all that can be known to a "common man", not versed in all odds and ends of the compiler & GC, is currently spread between some more or less obscure threads on golang-nuts, golang-dev, and maybe some random entries on the issue tracker...))
(Related to 2., 3.) Also, how about structs containing uintptr?

I'd be really very grateful if answers to those questions could be incorporated into this proposal...

ianlancetaylor commented 9 years ago

@griesemer Yes. If the GC requires notification about pinned pointers, cgo will generate the calls to pin them.

RLH commented 9 years ago

Likewise if the GC needs to know when an object is no longer pinned, cgo will generate the calls to unpin the object.

On Mon, Aug 31, 2015 at 3:11 PM, Ian Lance Taylor notifications@github.com wrote:

@griesemer https://github.com/griesemer Yes. If the GC requires notification about pinned pointers, cgo will generate the calls to pin them.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12416#issuecomment-136471189.

ianlancetaylor commented 9 years ago

@mdempsky

For the first rule, can you clarify what "Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers." means and perhaps some further rationale behind this restriction? I understand why C code mustn't mutate Go pointers, but there seems to be a significant gap between those two restrictions.

If at some point we have a moving GC, then the GC will need to update all Go pointers to point to their new location. We can work around that for a Go pointer explicitly passed to C code, by temporarily pinning the pointer. However, if the memory to which that Go pointer points itself holds pointers, then we must either pin those pointers too or we must modify them while the C code is running. The former is difficult and causes pinning to become a transitive operation. The latter is racy.

The only way this restriction could be a problem would be if Go code wanted to pass a complex data structure into C code, where the data structure is allocated by Go code in Go memory. It would have to use C types, of course, or the C code wouldn't be able to use it. That is not a case that seems likely to arise often, and in particular it seems often best to allocate these complex data structures using C memory via C.malloc.

So I think it's preferable to use a relatively simple rule here, rather than try to develop a more complex set of restrictions. This proposal aims to find out whether people can live with the rule.

Your code example satisfies the rule. The Go memory to which the pointer points is the [1024]byte.

Also, I assume by "Go pointers" you actually mean to include all pointer-like types (i.e., strings, slices, functions, interfaces, maps, channels)?

Yes.

ianlancetaylor commented 9 years ago

@minux Your example is fine. I tried to clearly define, for purposes of this proposal, a Go pointer as a pointer into memory allocated by the Go allocator. It is a purely dynamic concept that has nothing to do with types.

The cgo part of the proposal has to do with types, but that is not the actual rules about what is permitted.

ianlancetaylor commented 9 years ago

@akavel

Is this understood to cover the syscall package use cases too? I.e. when somebody does import "syscall", but does not import "C". Please state this explicitly, whether it is covered or not, and to what extent. I'm aware that the underlying mechanisms are the same to some extent (or at least were a few versions ago), but I'm especially not sure if that's assumed "official" or "accidental". I believe it isn't really stated anywhere in the official docs (or is it?). This proposal could be a very good place to clarify this, and I'd be really grateful for that.

Clearly the syscall package has to consider the same set of issues, but these rules are intended to apply to cgo. We control all the entry points to the syscall package, so if appropriate we can do something different. I think that for Windows we essentially need to define for each entry point what is permitted.

For example, currently on Unix the function syscall.ForkExec takes a pointer to a syscall.ProcAttr and that structure is permitted to contain Go pointers. This is fine, and it demonstrates that the syscall functions are not required to precisely follow the rules for cgo.

(Related to 1.) Specifically, the WinAPI syscalls often take uintptr as an argument. In +/-Go 1.4 it was (AFAIU) semi-officially named as a "non-pointer". Am I right to undertand this then falls outside the current wording of the discussed proposal? And thus, back into uncharted territory...? :( Or is uintptr now blessed into pointers family?

What matters for this proposal is whether the value is a pointer into Go memory. The actual type of the value does not matter.

(Related to 2.) If uintptr is still not-a-pointer, is there a chance that any rules for "pinning" a pointer could become part of this proposal? Such that they'd enable taking a uintptr of a pointer in some "officially safe" way (hopefully "in Go 1.x where x>=4") and passing it into syscall/C world for some (finite?) time?

One goal of this proposal is to avoid any documented rules for pinning pointers. Sure, this may be implemented by pinning internally, but I don't think we want to make pinning pointers part of the Go runtime API.

(Related to 2., 3.) Also, how about structs containing uintptr?

What matters is the values stored in this uintptr fields.

Update: some of my above comments were incorrect. A Go value with type uintptr is not going to be treated as a Go pointer. While the type of the pointer doesn't matter, it does have to be a pointer.

minux commented 9 years ago

On Mon, Aug 31, 2015 at 3:22 PM, Ian Lance Taylor notifications@github.com wrote:

@minux https://github.com/minux Your example is fine. I tried to clearly define, for purposes of this proposal, a Go pointer as a pointer into memory allocated by the Go allocator. It is a purely dynamic concept that has nothing to do with types.

Could we add a dynamic checking to cgo to verify that no pointer points to Go heap? That check can be enabled with certain mechanism, for example, a special build tag, to ease diagnosis of cgo/GC interaction problems?

mdempsky commented 9 years ago

@ianlancetaylor So it sounds like the rationale then is you want to do direct pinning without transitive pinning, and transitive pinning is necessary if C code is going to access Go pointers in Go memory. Your proposed solution seems to try to use the C type system to prevent C code from accessing Go pointers in Go memory.

For completeness, I see another solution of simply ruling that C code may not access Go pointers in Go memory. I.e., if a *N pointer is passed to C code, the C code may freely read/write Buf, but not Next. Instead, the C code would need to call into a Go helper if it wants to access the pointers. (Analogously, Go code needs to call into C helpers to access unusual C struct members like bit fields.)

ianlancetaylor commented 9 years ago

@minux Yes, we could add that dynamic check. I can't decide how much it would help. To be safe, it would have to consider the type of the argument, so that we don't give an error for *float64 where the float value happens to look like a pointer. But then a Go pointer converted to uintptr will slip through. So it becomes another partial check. It might still be worth doing, of course.

ianlancetaylor commented 9 years ago

@mdempsky I think I would say that we want to put the smallest possible number of restrictions on future garbage collection work. Prohibiting passing any Go pointers into C is too severe and prevents too many reasonable coding patterns. This is the minimal next step.

To be very pedantic, the proposed solution is the set of rules listed in the issue. The cgo work is an attempt to detect violations of those rules at build time.

For completeness, I see another solution of simply ruling that C code may not access Go pointers in Go memory. I.e., if a *N pointer is passed to C code, the C code may freely read/write Buf, but not Next. Instead, the C code would need to call into a Go helper if it wants to access the pointers. (Analogously, Go code needs to call into C helpers to access unusual C struct members like bit fields.)

I don't see any technical problems with this. But it seems easier to get wrong, and doesn't seem any simpler.

sbinet commented 9 years ago

IIUC, the rule stating that one should not store pointers to go memory from C code (as the pointed at go value may be a struct holding fields pointing to go memory, would significantly complicate writing (in my case) extension modules in go, like I do in gopy and tracked there: go-python/gopy#58.

(I am not complaining, I am just trying to make sure I correctly understand the proposal)

does this mean that the "only" way to write extension modules (for CPython, C-ruby, C-shiny-interpreter) which usually involves wrapping a go struct inside a C struct for the target VM consumption, is to use a mechanism like golang.org/x/mobile/bind/seq? ie: in-process RPC, a kind of LPC (local procedure call)?

ianlancetaylor commented 9 years ago

@sbinet Yes, I think you are correct. Sorry.

Serialization is one approach, but it is not the only approach. Another approach is a map[int]interface{}. For each Go value you need to pass to Python, increment a counter to get a token value, and store the Go value in the map at that location. When calling back into Go, pass the token value. The Go side uses the token to look into the map to fetch the value. When you release the Python value, tell the Go side to delete the map entry. This mechanism is probably simplest if the Python code never needs to examine the Go value.

sbinet commented 9 years ago

couldn't this be implemented as some kind of C-api on top of reflect ? (or a cgo-exported set of functions gathered in a cgo package?)

dr2chase commented 9 years ago

For each Go value you need to pass to Python, increment a counter to get a token value, and store the Go value in the map at that location. When calling back into Go, pass the token value. The Go side uses the token to look into the map to fetch the value.

Is the map a global? That would imply contention and/or synchronization. We don't have a notion of thread-local/Goroutine-local storage in Go, do we? This would also need to survive a transition from Go to C to Go to make this work.

minux commented 9 years ago

On Mon, Aug 31, 2015 at 6:02 PM, dr2chase notifications@github.com wrote:

For each Go value you need to pass to Python, increment a counter to get a token value, and store the Go value in the map at that location. When calling back into Go, pass the token value. The Go side uses the token to look into the map to fetch the value.

Is the map a global? That would imply contention and/or synchronization. We don't have a notion of thread-local/Goroutine-local storage in Go, do we? This would also need to survive a transition from Go to C to Go to make this work.

The map will be global. And no, we don't have thread-local/goroutine-local storage, intentionally. I don't think the contention on the map will be relevant for this particular case as the Python interpreter is pretty much single threaded due to its GIL.

ianlancetaylor commented 9 years ago

@dr2chase @minux Right, the map would be a global variable. Besides Minux's point about Python, contention on the map is going to be negligible compared to the overhead of calling between C and Go. In any case, it's only a suggestion. Other approaches are possible.

ianlancetaylor commented 9 years ago

@sbinet

couldn't this be implemented as some kind of C-api on top of reflect ? (or a cgo-exported set of functions gathered in a cgo package?)

I'm sorry, I don't understand what you are asking.

For an example of the map approach, see https://github.com/swig/swig/blob/master/Lib/go/goruntime.swg#L408 .

ianlancetaylor commented 9 years ago

Design doc at https://go-review.googlesource.com/14112 . It incorporates some of the above discussion.

gopherbot commented 9 years ago

CL https://golang.org/cl/14112 mentions this issue.

akavel commented 9 years ago

Thanks for some clarifications, and mention of syscall in the doc.

Clearly the syscall package has to consider the same set of issues, but these rules are intended to apply to cgo. We control all the entry points to the syscall package, so if appropriate we can do something different. I think that for Windows we essentially need to define for each entry point what is permitted.

For example, currently on Unix the function syscall.ForkExec takes a pointer to a syscall.ProcAttr and that structure is permitted to contain Go pointers. This is fine, and it demonstrates that the syscall functions are not required to precisely follow the rules for cgo.

Would those per-entry-point "detailed" features be available to third-party library/package writers? I believe there's significant value in being able to call WinAPI functions from pure Go code, which then can be just "go get", without need for a whole MinGW toolchain installed and well-configured (and necessarily from some specific, this-month-fashionable fork). And I believe there are third-party libraries making use of that, and not only on Windows but I believe on Linux too. Not to mention golang.org/x/sys, I suppose?

And, right, WinAPI calls do love to receive pointers to structs with pointers (including ASCIIZ strings), that's quite true.

What matters for this proposal is whether the value is a pointer into Go memory. The actual type of the value does not matter. (...) What matters is the values stored in this uintptr fields.

I'm not sure I understand still. In Go ~1.4 the boundary between what to GC was a "pointer" vs. "not a pointer, just an int, even if same bit pattern" seemed to be placed quite explicitly between unsafe.Pointer and uintptr. From the above I seem to understand this may possibly change? That said, from other comments I'm starting to understand the specifics here are maybe expected to be fleshed out later (or do I misunderstand?)

ianlancetaylor commented 9 years ago

@akavel

Would those per-entry-point "detailed" features be available to third-party library/package writers? I believe there's significant value in being able to call WinAPI functions from pure Go code, which then can be just "go get", without need for a whole MinGW toolchain installed and well-configured (and necessarily from some specific, this-month-fashionable fork). And I believe there are third-party libraries making use of that, and not only on Windows but I believe on Linux too. Not to mention golang.org/x/sys, I suppose?

I don't know. The syscall and x/sys packages can provide functions to handle structs with pointers, doing things like copying the pointers into non-garbage-collected memory. I think it would be fine to make something like that generally available. I think we have to be careful about what we make generally available, to avoid overly constraining future GC work.

I'm not sure I understand still. In Go ~1.4 the boundary between what to GC was a "pointer" vs. "not a pointer, just an int, even if same bit pattern" seemed to be placed quite explicitly between unsafe.Pointer and uintptr. From the above I seem to understand this may possibly change? That said, from other comments I'm starting to understand the specifics here are maybe expected to be fleshed out later (or do I misunderstand?)

There is not any intent to change the garbage collector behaviour. This proposal is about the very specific problem of how to safely pass a Go pointer to C code. You're right: I was wrong about the types. A value of type uintptr is not a pointer. You are already in potential trouble if you convert a Go pointer to uintptr. This proposal won't make that worse or better. The type of a Go pointer doesn't matter, but it does have to be some sort of pointer type.

sbinet commented 9 years ago

@ianlancetaylor sorry, I wasn't indeed very clear. it's probably only tangential to this issue, but as the crux of the problem, at least for writing extension modules, is to wrap go values which may or may not contain go pointers, I figured it could be possible to address this issue once, and for all the VMs one wants to extend with go. wrapping a reflect.Value and exposing a C API that can be consumed by all interpreted languages' object model, with either the map-registry approach or the serialization approach for crossing the go/C boundary.

well, I am just thinking out loud, but it could be a nice "niche" to fill.

akavel commented 9 years ago

Would those per-entry-point "detailed" features be available to third-party library/package writers (...)

I don't know. The syscall and x/sys packages can provide functions to handle structs with pointers, doing things like copying the pointers into non-garbage-collected memory. I think it would be fine to make something like that generally available. I think we have to be careful about what we make generally available, to avoid overly constraining future GC work.

Thanks. I understand this is non-binding reply as of now, but it's somewhat reassuring to hear you want to consider this. I hope you'll manage to do that.

There is not any intent to change the garbage collector behaviour. This proposal is about the very specific problem of how to safely pass a Go pointer to C code. You're right: I was wrong about the types. A value of type uintptr is not a pointer. You are already in potential trouble if you convert a Go pointer to uintptr. This proposal won't make that worse or better. The type of a Go pointer doesn't matter, but it does have to be some sort of pointer type.

I'm aware I'm stepping somewhat risky border full of mines, but talking to WinAPI is already C-style dangerous out of itself, so that's not so much different. As long as it's possible at all. And given the benefits of Go, and its difference to C, I'm generally ok with the rules changing with every Go release version. As long as I can know the rough rules. For now, it was generally possible to kinda learn them... eventually... after asking on golang-nuts/-dev. What I'm afraid of, is that it may become "officially neglected" at some point, to the extent of it becoming impossible to use any WinAPI calls in third-party libs without cgo. That's a very useful feature, and losing it would become... unfortunate, and somewhat costly. Including the fact, that currently it's easy to cross-compile from Linux to Windows, and having to use cgo there would be even more cumbersome.

In some other aspects, over the night I remembered of one particular WinAPI function, RegisterClassEx(). This one is possibly especially troublesome, in that it takes a pointer to a struct which must contain a pointer to a C-function. And it then keeps hold of it (the func at least) permanently. On the other hand, it's the only way any GUI apps can be created in WinAPI. The first meaningful thing one must do after starting a GUI app is to register a callback via this function. For now, it seemed to be doable (with syscall.NewCallback, and keeping the pointer "heap-pinned" by some global var). Losing that would completely disable possibility of creating GUI apps on Windows without cgo. Again, that would be a serious pity.

alexbrainman commented 9 years ago

In some other aspects, over the night I remembered of one particular WinAPI function,RegisterClassEx(). This one is possibly especially troublesome, in that it takes a pointer to a struct which must contain a pointer to a C-function. And it then keeps hold of it (the func at least) permanently. On the other hand, it's the only way any GUI apps can be created in WinAPI. The first meaningful thing one must do after starting a GUI app is to register a callback via this function. For now, it seemed to be doable (with syscall.NewCallback, and keeping the pointer "heap-pinned" by some global var). Losing that would completely disable possibility of creating GUI apps on Windows without cgo. Again, that would be a serious pity.

syscall.NewCallback returns an uintptr - address to the start of small asm function that eventually end up in Go function. None of that dependant on GC and is generated at compile time. I don’t see any of that break in the near future. And we have a few tests to verify that. And there is some new windows GUI development work in golang.org/x/exp/shiny/driver/windriver happening, so there will be incentive to keep it all going.

Alex

ianlancetaylor commented 9 years ago

The design doc for this proposal can be seen at https://github.com/golang/proposal/blob/master/design/12416-cgo-pointers.md .

akavel commented 9 years ago

I find the proposal's wording confusing in "Background" section, in paragraph about Go 1.5:

"C code may not store a Go pointer into Go memory"

Is this intended to mean: "C code may not store in Go memory a Go pointer"? Or something else?

Similarly:

"C code can still store a Go pointer into C memory, [with restrictions]".

Is this meant as: "C code can store in C memory a Go pointer"?

Thanks.

@alexbrainman Thanks for the clarification on Windows callbacks!

ianlancetaylor commented 9 years ago

@akavel I'm sorry, I don't understand the question you are asking. To me, these two sentences mean the same thing: "C code may not store a Go pointer into Go memory" and "C code may not store, in Go memory, a Go pointer."

akavel commented 9 years ago

@ianlancetaylor (Please bear in mind I'm not a native English speaker.) For quite long, this looked to me like: "C code may not store a (Go pointer into Go memory)", and the only meaning I was able to interpret from this was, that it maybe accidentally lost the "Go pointer definition" to mean: "C code may not store [anywhere] a Go[-originating] pointer [to] Go memory"; in other words: "C code may not store anywhere a Go pointer as defined earlier". Only much later it occurred to me that the "into Go memory" part may link to "store", not to "Go pointer", but I still wasn't certain.

With this clarification, I now know how to read this, thanks. That said, it'd be nice if you could consider rephrasing as above, so that there's no place for confusion, possibly. Tx

gopherbot commented 9 years ago

CL https://golang.org/cl/15100 mentions this issue.

ianlancetaylor commented 9 years ago

An update: after doing some work on adding the cgo restrictions I described, I'm no longer happy with the idea. I feel like the basic rules about what a Go program is permitted to do are understandable. The cgo changes, however, were enforcing a different set of rules. If the cgo restrictions reliably forbade everything forbidden by the rules, then that might be OK, but they don't. So it would leave us with two different kinds of restrictions that programmers would have to understand, which does not seem good.

I am now leaning toward implementing a dynamic check in cgo that catches most invalid cases, and an optional dynamic check that catches all invalid cases. I am also thinking about a go vet checker to look for cases that look fishy.

gopherbot commented 9 years ago

CL https://golang.org/cl/16003 mentions this issue.

taruti commented 9 years ago

The proposal looks very nice. After reading it I ended up with few concerns:

A) scatter-gather IO syscalls like writev using go []byte data seem impossible to do? Would they be ok with some extra annotations specific to them?

B) Could we find out or annotate C functions that do not call back into Go and thus avoid having to mark data as escaping and allocating from heap?

C) what kind of errors will be returned when a C function takes too long? What is the story with regards to read(2) and blocking devices?

ianlancetaylor commented 9 years ago

@taruti

A) You're right that scatter/gather syscalls using [][]byte won't work. I'm not opposed to handling them somehow in the future, though I don't know what that would look like. In the meantime you'll have to allocate the buffers in C, or you'll have to call the C code with multiple arguments.

B) It's possible, but does it really come up that often in a performance sensitive context?

C) In the default mode, nothing will happen if a C function takes too long, except that your program might eventually run out of memory because the GC can't run (that won't happen with 1.6 but it might happen in future releases). In the checking mode, you will get a panic. If you want to call a C function that does a read on a device that blocks arbitrarily long, you should read the data into a buffer allocated by C. We will make the syscall.Read function do the right thing on a long-blocking read, whatever that turns out to be.

RLH commented 9 years ago

One issue that came up here is what does this mean with respect to race conditions. I propose that any field or element in the referent of a go pointer passed to C will be considered a read by the race detector. This will tighten up the semantics around what Go can write while a C routine is running.

On Mon, Oct 19, 2015 at 2:38 PM, Ian Lance Taylor notifications@github.com wrote:

@taruti https://github.com/taruti

A) You're right that scatter/gather syscalls using [][]byte won't work. I'm not opposed to handling them somehow in the future, though I don't know what that would look like. In the meantime you'll have to allocate the buffers in C, or you'll have to call the C code with multiple arguments.

B) It's possible, but does it really come up that often in a performance sensitive context?

C) In the default mode, nothing will happen if a C function takes too long, except that your program might eventually run out of memory because the GC can't run (that won't happen with 1.6 but it might happen in future releases). In the checking mode, you will get a panic. If you want to call a C function that does a read on a device that blocks arbitrarily long, you should read the data into a buffer allocated by C. We will make the syscall.Read function do the right thing on a long-blocking read, whatever that turns out to be.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12416#issuecomment-149307830.

ianlancetaylor commented 9 years ago

In discussion with the Go runtime team, we decided to drop the requirement that a C function that takes a Go pointer argument must return. While it's true that for some possible implementations of the Go garbage collector this will cause trouble, it's not a problem today, and the problem for other implementations, especially on 64-bit systems, is somewhat theoretical.

cznic commented 9 years ago

Evil remark: That perhaps reintroduces hope for the possibility of having explicit pinning in the future (?)

ianlancetaylor commented 9 years ago

@cznic No, there is no current intention for ever having explicit pinning. Any program that needs something like explicit pinning can allocate the pinned memory using C.malloc.

cznic commented 9 years ago

Wrt "any program": I guess it means "any program not running on GAE/GCE or similar restricted environment".

Edit: Scratch this comment please.

ianlancetaylor commented 9 years ago

This proposal is about calling between Go and C. There is no point to discussing its use in environments that don't permit C code.

(By the way, GCE does permit calling C code, though GAE does not.)

In any case, I can't see any reason for pinning in a pure Go environment.

RLH commented 9 years ago

We appear to have reached consensus here and need to move forward with the public proposal process. This proposal does no conflict with the GC roadmap and the benefits of agreeing on the rules clearly outweighs any potential future enhancements to the GC.

aclements commented 9 years ago

Looks good overall. I just have a few questions and clarifications.

The rules don't seem to prohibit Go code writing a Go pointer in to an object in the C heap. From my understanding, this is a violation of the invariant and will be detected by the "expensive" checking mode, but doesn't seem to be a violation of the rules. Is that okay because these rules are just about passing pointers? Or do we want to call this out explicitly?

The Go garbage collector must be aware of the location of all Go pointers; a known number of pointers may be temporarily visible to C code, meaning that they exist in an area that the garbage collector can not see, and may not be modified or released.

What's meant by "the location" of all Go pointers? One way to interpret that would be the location in memory of all Go pointers, but clearly the GC doesn't know the location of Go pointers passed in to C. Do you mean that it must be aware of the memory address of every Go pointer except a known number temporarily visible to C code?

Rather than "a known number" would a "a known set" be better? "A known number" could mean we know that there are 5 pointers visible to C but don't know what they are. The rules seem to support knowing exactly what the pointers are.

By definition, since that memory block may not contain any pointers, this will only pin a single block of memory.

May not contain any Go pointers.

RLH commented 9 years ago

The rules don't seem to prohibit Go code writing a Go pointer in to an object in the C heap. From my understanding, this is a violation of the invariant and will be detected by the "expensive" checking mode, but doesn't seem to be a violation of the rules. Is that okay because these rules are just about passing pointers? Or do we want to call this out explicitly?

I think you are right, now that we do not require that a C function return in a timely fashion we also need to prevent the go pointers from being shared on the C malloc heap.

"However, C code may not keep a copy of the Go pointer after the call returns." is no longer sufficient.

ianlancetaylor commented 9 years ago

@aclements Thanks for the comments. I sent you a CL (16661).

aclements commented 9 years ago

LGTM with the updates in CL 16661.

gopherbot commented 9 years ago

CL https://golang.org/cl/16899 mentions this issue.

golang / go

cmd/cgo: specify rules for passing pointers between Go and C #12416