golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

runtime: provide Pinner API for object pinning #46787

Closed ansiwen closed 1 year ago

ansiwen commented 3 years ago

Update, 2021-10-20: the latest proposal is the API in https://github.com/golang/go/issues/46787#issuecomment-942547949.


Problem Statement

The pointer passing rules state:

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers.

and

Go code may not store a Go pointer in C memory.

There are C APIs, most notably the iovec based ones for vectored I/O which expect an array of structs that describe buffers to read to or write from. The naive approach would be to allocate both the array and the buffers with C.malloc() and then either work on the C buffers directly or copy the content to Go buffers. In the case of Go bindings for a C API, which is assumably the most common use case for Cgo, the users of the bindings shouldn't have to deal with C types, which means that all data has to be copied into Go allocated buffers. This of course impairs the performance, especially for larger buffers. Therefore it would be desirable to have a safe possibility to let the C API write directly into the Go buffers. This, however, is not possible because

Obviously, what is missing is a safe way to pin an arbitrary number of Go pointers in order to store them in C memory or in passed-to-C Go memory for the duration of a C call.

Workarounds

Break the rules and store the Go pointer in C memory

(click) with something like ```go IovecCPtr.iov_base = unsafe.Pointer(myGoPtr) ``` but `GODEBUG=cgocheck=2` would catch that. However, you can circumvent cgocheck=2 with this casting trick: ```go *(*uintptr)(unsafe.Pointer(&IovecCPtr.iov_base)) = uintptr(myGoPtr) ``` This might work, as long as the GC is not moving the pointers, which might be a fact as of now, but is not guaranteed.

Break the rules and hide the Go pointer in Go memory

(click) with something like ```go type iovecT struct { iov_base uintptr iov_len C.size_t } iovec := make([]iovecT, numberOfBuffers) for i := range iovec { bufferPtr := unsafe.Pointer(&bufferArray[i][0]) iovec[i].iov_base = uintptr(bufferPtr) iovec[i].iov_len = C.size_t(len(bufferArray[i])) } n := C.my_iovec_read((*C.struct_iovec)(unsafe.Pointer(&iovec[0])), C.int(numberOfBuffers)) ``` Again: This might work, as long as the GC is not moving the pointers. `GODEBUG=cgocheck=2` wouldn't complain about this.

Break the rules and temporarily disable cgocheck

(click) If hiding the Go pointer as a uintptr like in the last workaround is not possible, passing Go memory that contains Go pointers usually bails out because of the default `cgocheck=1` setting. It is possible to disable temporarily `cgocheck` during a C call, which can especially useful, when the pointer have been "pinned" with one of the later workarounds. For example the `_cgoCheckPtr()` function, that is used in the generated Cgo code, can be shadowed in the local scope, which disables the check for the following C calls in the scope: ```go func ... { _cgoCheckPointer := func(interface{}, interface{}) {} C.my_c_function(x, y) } ``` Maybe slightly more robust, is to export the runtime.dbgvars list: ```go type dbgVar struct { name string value *int32 } //go:linkname dbgvars runtime.dbgvars var dbgvars []dbgVar var cgocheck = func() *int32 { for i := range dbgvars { if dbgvars[i].name == "cgocheck" { return dbgvars[i].value } } panic("Couln't find cgocheck debug variable") }() func ... { before := *cgocheck *cgocheck = 0 C.my_c_function(x, y) *cgocheck = before } ```

Use a C function to store the Go pointer in C memory

(click) The rules allow that a C function stores a Go pointer in C memory for the duration of the call. So, for each Go pointer a C function can be called in a Go routine, that stores the Go pointer in C memory and then calls a Go function callback that waits for a release signal. After the release signal is received, the Go callback returns to the C function, the C function clears the C memory from the Go pointer, and returns as well, finishing the Go routine. This approach fully complies with the rules, but is quite expensive, because each Go routine that calls a C function creates a new thread, that means one thread per stored Go pointer.

Use the //go:uintptrescapes compiler directive

(click) `//go:uintptrescapes` is a compiler directive that > specifies that the function's uintptr arguments may be pointer values that have been converted to uintptr and must be treated as such by the garbage collector. So, similar to the workaround before, a Go function with this directive can be called in a Go routine, which simply waits for a release signal. When the signal is received, the function returns and sets the pointer free. This seems already almost like a proper solution, so that I implemented a package with this approach, that allows to `Pin()` a Go pointer and `Poke()` it into C memory: [PtrGuard](https://github.com/ansiwen/ptrguard) But there are still caveats. The compiler and the runtime (cgocheck=2) don't seem to know about which pointers are protected by the directive, because they still don't allow to pass Go memory containing these Go pointers to a C function, or to store the pointers in C memory. Therefore the two first workarounds are additionally necessary. Also there is the small overhead for the Go routine and the release signalling.

Proposal

It would make Cgo a lot more usable for C APIs with more complex pointer handling like iovec, if there would be a programmatic way to provide what //go:uintptrescapes provides already through the backdoor. There should be a possibility to pin an arbitrary amount of Go pointers in the current scope, so that they are allowed to be stored in C memory or be contained in Go memory that is passed to a C function within this scope, for example with a runtime.PtrEscapes() function. It's cumbersome, that it's required to abuse Go routines, channels and casting tricks in order provide bindings to such C APIs. As long as the Go GC is not moving pointers, it could be a trivial implementation, but it would encapsulate this knowledge and would give users a guarantee.

I know from the other issues and discussions around this topic that it's seen as dangerous if it is possible to pin an arbitrary amount of pointers. But

  1. it is possible to call an arbitrary amount of C or //go:uintptrescapes functions, therefore it is also possible to pin arbitrary amount of Go pointers already.
  2. it is necessary for some C APIs

Related issues: #32115, #40431

/cc @ianlancetaylor @rsc @seebs

edit: the first workaround had an incorrect statement. edit 2: add workarounds for disabling cgocheck

DeedleFake commented 3 years ago

From what I can tell from the documentation for the new cgo.Handle, it's intended only for a situation where a pointer needs to be passed from Go to C and then back to Go without the C code doing anything with what it points to. As it passes a handle ID, not a real pointer, the C code can't actually get access to the actual data. Maybe a function could be provided on the C side that takes a handle ID and returns the original pointer, thus allowing the C code to access the data? Would that solve this issue?

Edit: Wait, that doesn't make sense. Could you just use Handle to make sure that it's held onto? Could the definition of Handle be extended to mean that the pointer itself is valid for the duration of the Handle's existence? In other words, this would be defined to be valid:

// void doSomethingWithAPointer(int *a);
import "C"

func main() {
  v := C.int(3)
  h := cgo.NewHandle(&v)
  doSomethingWithAPointer(&v) // Safe because the handle exists for that pointer.
  h.Delete()
}

Alternatively, if that's not feasible, what about a method on Handle that returns a valid pointer for the given value?

// Pointer returns a C pointer that points to the underlying value of the handle
// and is valid for the life of the handle.
func (h Handle) Pointer() C.uintptr_t

Disclaimer: I'm not familiar enough with the internals of either the Go garbage collector or Cgo to know if either of these even make sense.

ansiwen commented 3 years ago

@DeedleFake As you pointed out yourself, the cgo.Handle has a very different purpose. It's just a registry for a map from a C compatible arbitrary ID (uintptr) to an arbitrary Go value. It's purpose is to refer to a Go value in the C world, not to access it from there. It doesn't affect the behavior of the garbage collector, which could still freely move around the values in the Handle map, and would never delete them, since they are referenced by the map.

ianlancetaylor commented 3 years ago

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns. As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

ansiwen commented 3 years ago

@ianlancetaylor thanks for taking the time to answer.

An big advantage of the current cgo mechanisms, including go:uintptrescapes, is that the pointers are automatically unpinned when the cgo function returns.

I agree, that is an advantage. However, with go routines it's trivial to fire-and-forget thousands of such function calls, that never return.

As far as I can see you didn't propose any particular mechanism for pinning pointers, but it would be very desirable to somehow ensure that the pointers are unpinned. Otherwise code could easily get into scenarios in which pointers remain pinned forever, which if Go ever implements a full moving garbage collector will cause the garbage collector to silently behave quite poorly. In other words, some APIs that could solve this problem will be be footguns: code that can easily cause a program to silently behave badly in ways that will be very hard to detect.

I didn't describe a specific API, that's true. I hoped that this could be developed here together once we agreed on the requirements. One of the requirements that I mentioned was, that the pinning happens only for the current scope. That implies automatic unpinning when the scope is left. Sorry that I didn't make that clear enough. So, to rephrase more compactly, the requirements would be:

It's hard to say more without a specific API to discuss. If you suggested one, my apologies for missing it.

As stated above, I didn't want to suggest a specific API, but characteristics of it. In the end it could be a function like runtime.PtrEscapes(unsafe.Pointer). The usage could look like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
  numberOfBuffers := len(bufferArray)

  iovec := make([]C.struct_iovec, numberOfBuffers)

  for i := range iovec {
    bufferPtr := unsafe.Pointer(&bufferArray[i][0])
    runtime.PtrEscapes(bufferPtr) // <- pins the pointer and makes it known to escape to C
    iovec[i].iov_base = bufferPtr
    iovec[i].iov_len = C.size_t(len(bufferArray[i]))
  }

  n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
  // ^^^ cgocheck doesn't complain, because Go pointers in iovec are pinned
  return int(n) // <- all pinned pointers in iovec are unpinned
}

As long as the GC is not moving, runtime.PtrEscapes() is almost a no-op, it would basically only tell cgocheck not to bail out for these pointers. But users would have a guarantee, that if the GC becomes moving later, this function will take care of it.

Regarding footguns I'm pretty sure, that the workarounds, that have to be used at the moment to solve these problems, will cause more "programs to silently behave badly" than the potential abuse of a proper pinning method.

bcmills commented 3 years ago

it would be very desirable to somehow ensure that the pointers are unpinned

Drawing from runtime.KeepAlive, one possibility might be something like:

package runtime

// Pin prevents the object to which p points from being relocated until
// the returned PointerPin either is unpinned or becomes unreachable.
func Pin[T any](p *T) PointerPin

type PointerPin struct {…}
func (p PointerPin) Unpin() {}

Then the example might look like:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
    numberOfBuffers := len(bufferArray)

    iovec := make([]C.struct_iovec, numberOfBuffers)

    for i := range iovec {
        bufferPtr := unsafe.Pointer(&bufferArray[i][0])
        defer runtime.Pin(bufferPtr).Unpin()
        iovec[i].iov_base = bufferPtr
        iovec[i].iov_len = C.size_t(len(bufferArray[i]))
    }

    n := C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
    return int(n)
}

A vet warning could verify that the result of runtime.Pin is used, to ensure that it is not accidentally released too early (see also #20803).

phlogistonjohn commented 3 years ago

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray. I'm trying to double check what the behavior would be regarding if we needed to make multiple calls into C using the same pointer.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

ansiwen commented 3 years ago

@ansiwen when you write "automatic unpinning when the current scope is left (the current function returns)" the current scope you refer to is the scope of the Go function correct? In your example that would be ReadFileIntoBufferArray.

@phlogistonjohn Yes, exactly.

@bcmills version also looks very natural flowing to me, and in that version it's clear that the pointer would be pinned until the defer at the end of ReadFileIntoBufferArray.

Yes, I also would prefer @bcmills version from a user's perspective, because it's more explicit and it's basically the same API that we use with PtrGuard.

I just don't know enough about the implications on the implementation side and effects on the Go internals, so I don't know what API would be more feasible. My proposal is about providing an official way to solve the described problem. I really don't care so much about the "form", that is how exactly the API looks like. Whatever works best with the current Go and Cgo implementation. 😊

ansiwen commented 3 years ago

@bcmills I guess, an argument @ianlancetaylor might bring up against your API proposal is, that it would allow to store the PointerPin value in a variable and keep them pinned for an unlimited time, so it would not "ensure that the pointers are unpinned". If the unpinning is implicit, it is more comparable to //go:uintptrescapes.

ansiwen commented 3 years ago

@ianlancetaylor

it would be very desirable to somehow ensure that the pointers are unpinned.

So, if you want to enforce the unpinning, the only strict RAII pattern in Go that I could come up with is using a scoped constructor like this API:

package runtime

// Pinner is the context for pinning pointers with Pin()
// can't be copied or constructed outside a Pinner scope
type Pinner struct {…}

// Pin prevents the object to which p points from being relocated until
// Pinner becomes invalid.
func (Pinner) Pin(p unsafe.Pointer) {...}

func WithPinner(func(Pinner)) {...}

which would be used like this:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
    numberOfBuffers := len(bufferArray)

    iovec := make([]C.struct_iovec, numberOfBuffers)

    var n C.ssize_t
    runtime.WithPinner(func (pinner runtime.Pinner) {
        for i := range iovec {
            bufferPtr := unsafe.Pointer(&bufferArray[i][0])
            pinner.Pin(bufferPtr)
            iovec[i].iov_base = bufferPtr
            iovec[i].iov_len = C.size_t(len(bufferArray[i]))
        }

        n = C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
    }) // <- All pinned pointers are released here and pinner is invalidated (in case it's copied out of scope).
    return int(n)
}

I personally would prefer a thinner API, where either it must be explicitly unpinned, like in the proposal of @bcmills, or - even better - the pinning implicitly creates a defer for the scope in which the pinning function has been called from. Given, that this will be implemented in the runtime package, I guess there are tricks and magic that can be used there.

Merovius commented 3 years ago

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

ansiwen commented 3 years ago

@Merovius

@ansiwen Even with the func API you suggest, a user might store the argument in a closed-over variable, to have it survive the function. In general, as long as the pin is represented by some value, we can't prevent that value from being kept around. So I don't think your version has significant safety-benefits as to compared to @bcmills, while being less wieldy and also potentially heavier in runtime cost (the closure might make it easier for things to escape).

The "keeping-around" can easily be prevented by one pointer indirection that get's invalidated when the scope is left. You can have a look at my implementation of PtrGuard that even has test case for exactly the case of a scope escaping variable.

Personally, as long as the PointerPin has to be intentionally kept around, I think that's fine. I think the suggestion to unpin when the PointerPin becomes unreachable already makes it sufficiently hard to shoot yourself in the foot to tolerate the risk. And we might be able to use go vet for additional safety (like warning if the result of Pin is assigned to a global var or something).

Yeah, I agree, as I wrote before, I'm totally fine with both. It's just something I came up with to address @ianlancetaylor's concerns. I also think that the risks are "manageable", there are all kinds of other risks when dealing with runtime and/or unsafe packages after all.

beoran commented 3 years ago

I think that the API proposed by @bcmills is the most useful one. Although there is a risk of forgetting to unpin a pointer, once Go gets a moving garby collector, for certain low level uses, certain blocks of memory will have to stay pinned for the duration of the program. Certainly for system calls in Linux, such as for the frame buffers. In other words, Pin and Unpin are also useful without cgo.

hnes commented 3 years ago

Hi @rsc, any updates on this issue recently? I noticed it has been several days after the 2021-08-04's review meeting minutes.

rsc commented 3 years ago

The compiler/runtime team has been talking a bit about this but don't have any clear suggestions yet.

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

/cc @aclements

ansiwen commented 3 years ago

The big problem with pinning is that if we ever want a moving garbage collector in the future, pins will make it much more complex. That's why we've avoided it so far.

@rsc But my point in the description was, that we have pinning already when C functions are called with Go pointers or when the //go:uintptrescapes directive is used. So the situation is complex already, isn't it?

beoran commented 3 years ago

@rsc I would say the converse is also true. If you are going to implement a moving garbage collector without support for pinning, that will make it much more complex to use Go for certain direct operating calls without cgo, e.g. on Linux. In other words, as @ansiwen says, there's really no way to avoid that complexity. And therefore I think it would be better if Go supported it explicitly than through workarounds.

ianlancetaylor commented 3 years ago

Unbounded pinning has the potential to be significantly worse than bounded pinning. If people accidentally or intentionally leave many pointers pinned, that can fragment the spaces that the GC uses, and make it very hard for a moving GC to make any progress at all. This can in principle happen with cgo today, but it is unlikely that many programs pass a bunch of pointers to a cgo function that never returns. When programmers control the pinning themselves, bugs are more likely. If the bug is in some imported third party library, the effect will be strange garbage collection behavior for the overall program. This will be hard to understand and hard to diagnose, and it will be hard to find the root cause. (One likely effect will be a set of tools similar to the memory profiler that track pinned pointers.)

It's also worth noting that we don't have a moving garbage collector today, so any problems that pinned pointers may introduce for a moving garbage collector will not be seen today. So if we ever do introduce a moving garbage collector, we will have a flag day of hard-to-diagnose garbage collection problems. This will make it that much harder to ever change the garbage collector in practice.

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

beoran commented 3 years ago

@ianlancetaylor That is fair enough. But then it seems to me the best way ahead is to put this issue on hold until we can implement a prototype moving garbage collector.

There is always a workaround if there is no pinning available and that is to manually allocate memory directly from the OS so the GC doesn't know about it. It is not ideal but it can work.

egonelbre commented 3 years ago

Yeah, one workaround that is missing from the discussion is hiding the C api allocation concerns, e.g. iovec could be implemented like:

package iovec

type Buffers struct {
    Data [][]byte

    data *C.uint8_t
    list *C.iovecT
}

func NewBuffers(sizes []int) *Buffers {
    ...
    // C.malloc everything
    // cast from *C.uint8_t to []byte
}

func (buffers *Buffers) ReadFrom(f *os.File) error { ...

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

ansiwen commented 3 years ago

@ianlancetaylor

So I do not think the current situation is nearly as complex as the situation would be if we add unbounded pinning. This doesn't mean that we shouldn't add unbounded pinning. But I think that it does mean that the argument for it has to be something other than "we can already pin pointers today."

Let's separate the two questions "pinning yes/no" and "pinning bounded/unbounded".

pinning yes/no

I also proposed

  1. an API that allows bounded pinning (runtime.WithPinner()).
  2. the potential possibility of a runtime.Pin() with no return value and an implicit defer that automatically gets unpinned when the current function returns.

Both provide a similar behaviour as the //go:uintptrescapes directive, if that is what you mean with "bounded". What do you think of these options?

pinning bounded/unbounded

  1. when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?
  2. when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?
  3. would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

For me personally the first question is more important. Bounded or unbounded, I think the existing and required ways of pinning should be made less hacky in their usage.

@egonelbre

Or in other words, from the problem statement, it's unclear why it's required to use bufferArray [][]byte as the argument.

The bufferArray [][]byte is just a placeholder for an arbitrary "native Go data structure". As the problem statement mentions, the goal is to avoid copying of the data. Especially vectored I/O is used for big amounts of data, so depending on the use case, you can't choose the target data structure by yourself, but it is provided by another library that you intend to use (let's say video processing for example). That would mean, that in all these cases you have to copy the data from your own C allocated data structure to the Go-allocated target data structure of your library, for no good reason.

ianlancetaylor commented 3 years ago

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

would the risk of many unpinned pointers not be similar to that of memory leaks, like with global dynamic data structures, that are possible now? I know, memory fragmentation is potentially worse than just allocating memory, but the effect would be similar: OOM errors.

Same answer.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

hnes commented 3 years ago

How would we deal with the iovec struct during vectored I/O syscall if we have a GC that is based on moving pointers? Maybe the same solution could also be applied to the pointer pinning we are discussing?

A GC based on moving pointers may be completely blocked by a pinned pointer.

I'm afraid that would badly impact the GC latency or something else if it is true. Please consider the disk i/o syscall that may block a very long time.

ansiwen commented 3 years ago

@ianlancetaylor

when we will have a moving GC, there will always be also a possibility to pin pointer or pause the moving, so this needs to be implemented in any case. Is this correct?

In some manner, yes.

when people leave pointers pinned, the GC will behave like a non-moving GC, so there is no regression beyond our current status-quo, right? So, what exactly do you mean with "hard-to-diagnose garbage collection problems"?

A GC that is based on moving pointers is not the same as a GC that does not move pointers. A GC based on moving pointers may be completely blocked by a pinned pointer, whereas for a non-moving GC a pinned pointer is just the same as a live pointer.

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

Again, all I am saying is that arguments based on "we already support pinned pointers, so it's OK to add more" are not good arguments. We need different arguments.

I don't think "add more" is the right wording. It's more about exposing the pinning in a better way. And these are not arguments for doing it, but arguments against the supposed risks of doing it.

The argument for doing it should be clear by now: give people a zero-copy way to use APIs like iovec with Go data structures in a future proof way. At the moment, that's not possible.

In your answers you skipped the first part about the bounded pinning. If you have the time to comment on these too, I would be very interested. 😊

ianlancetaylor commented 3 years ago

Since you agreed that the pinning is required in the answer before, I don't understand how such an implementation could be used in Go.

The current system for pinning pointers doesn't permit pointers to be pinned indefinitely, if we discount the unusual case of a C function that does not return.

I agree that other systems that somehow ensure that pointers can't be pinned indefinitely are better. (I don't think that an implicit defer is a good approach for Go, though.)

ansiwen commented 3 years ago

Here's another minimalistic API proposal for bounded pinning (basically a programmatic version of the uintptrescapes directive):

package runtime

// PtrEscapes prevents the allocated objects referenced by ptrs from being relocated until
// function f returns.
func PtrEscapes(ptrs []unsafe.Pointer, f func())

Example:

func ReadFileIntoBufferArray(f *os.File, bufferArray [][]byte) int {
    var buffers []unsafe.Pointer
    numberOfBuffers := len(bufferArray)

    iovec := make([]C.struct_iovec, numberOfBuffers)

    for i := range iovec {
        bufferPtr := unsafe.Pointer(&bufferArray[i][0])
        buffers = append(buffers, bufferPtr)
        iovec[i].iov_base = bufferPtr
        iovec[i].iov_len = C.size_t(len(bufferArray[i]))
    }

    var n C.size_t
    runtime.PtrEscapes(buffers, func() {
        n = C.readv(C.int(f.Fd()), &iovec[0], C.int(numberOfBuffers))
    })
    return int(n)
}
rsc commented 3 years ago

I think the main question we need to answer is whether the runtime/GC team wants to commit to any pinning API at all. /cc @aclements

bcmills commented 3 years ago

@ianlancetaylor

The current system for pinning pointers doesn't permit pointers to be pinned indefinitely, if we discount the unusual case of a C function that does not return.

Is there a reason to believe that we can discount that case today?

I would not be at all surprised to see Go programs that, say, transfer control to a C main-like function that then makes callbacks back into Go for certain parts of the program. If I recall correctly, some C GUI toolkits actually require the program to be structured in a very similar way.

prattmic commented 3 years ago

@hnes raised a concrete example in https://github.com/golang/go/issues/46787#issuecomment-903016806: We are primarily discussing iovecs, and a readv/writev/etc could very likely be on a blocking FD and block indefinitely.

randall77 commented 3 years ago

Here's a mini-proposal for a pinning mechanism that avoids some of the downsides mentioned above, particularly the problem of forgetting to unpin any pinned things.

Arguments of cgo calls are pinned for the duration of the call as they are now. In addition, this proposal lets you mark an object as "pinning flow-through" (terrible name, please suggest better ones). If a "pinning flow-through" object is pinned, then any objects it references are also pinned (for the duration of the outer pin). This way, you can't introduce any pinning roots, you can only make the "scope" of pinning during a cgo call larger. All pins when the cgo call returns are effectively dropped.

You would use this, for example, to mark the [][]byte object that you pass to writev as pinning flow-through. When that [][]byte is passed to writev, every []byte referenced is also pinned, for the duration of writev.

The runtime would keep track of this mark for the lifetime of the object, similar to how we keep track of finalizers currently. You would only need to mark an object once - you could use it many times for many cgo calls.

There would be no way to unmark a pinning flow-through object. (Although we could add such a thing, if people thought it was needed.) The critical feature that makes this proposal better than a raw pinning API is that it doesn't matter if we have lots of objects scattered around the heap marked as pinning flow-through. Only if they are referenced by a root pinning operation (aka used as an argument to a cgo call) do those marks mean anything.

pinning flow-through objects don't pin recursively. Only objects directly referenced from the pinning flow-through object are pinned. If you want deeper pinning, you'd have to mark everything but the leaves of the tree you want pinned.

Possibly having an actual runtime.SetPinningFlowThrough(object interface{}) API would be overkill, and it could be enough to have a special //go: annotation on system calls that would mark arguments as pinning flow-through for the duration of the call. Not sure if that would be enough, or if it would be easier than having an explicit runtime call.

balasanjay commented 3 years ago

Does that work in the io_uring model?

In that model, you write the [][]byte to a ring buffer shared with the kernel (effectively). And then you "submit" one or more entries in the ring buffer via a syscall; that syscall will immediately return (or more precisely, when that syscall returns, it doesn't necessarily mean the IOs that were just submitted have completed).

A later submission syscall could indicate to the caller that previously submitted IOs have completed. (It does this not via the mere fact that the syscall returned, but via a separate completion ring buffer shared between the application the kernel that the application has to process)

It feels like the pin might expire too early in your model (i.e. with the next call to io_uring_submit, rather than when the application pulls a matching entry from the completion queue ring buffer). And for that matter, the pin might start too late, because it needs to start before we start writing into the shared ring buffer, not when we invoke the io_uring_submit syscall.

(I suppose one option would be for the standard library to offer a higher-level io_uring API, e.g. a blocking SubmitBatch([]Submissions) that queues the submissions on an io_uring, waits until those specific completions have been received and then returns from SubmitBatch. The compiler would then use SubmitBatch as the scope of the pin, rather than a cgo call)

randall77 commented 3 years ago

Does that work in the io_uring model?

No, I don't think it does. Pointers that must be pinned while no cgo call is currently active would not be supported. (You'd have to allocate such things with C.malloc.)

aclements commented 3 years ago

I think that io_uring is actually a really interesting example of a bigger problem. For example, we already have a very similar problem in internal/poll on Windows: it uses I/O completion ports, which do pass Go pointers into the kernel across asynchronous system calls (much like io_uring). Granted, that's "internal" so technically we could do whatever we needed to make that work, but it shows that this is not just a theoretical problem with an API Go doesn't support yet. Another example is that it's common in graphics code to share long-lived graphics buffers with the kernel and the hardware, which would also require long-lived pinned memory. I'm not sure whether this is a problem in Go's current OpenGL packages, but it wouldn't surprise me.

eliasnaur commented 3 years ago

Another example is that it's common in graphics code to share long-lived graphics buffers with the kernel and the hardware, which would also require long-lived pinned memory. I'm not sure whether this is a problem in Go's current OpenGL packages, but it wouldn't surprise me.

In my experience, pinned Go memory wouldn't help for GPU API. It's true that legacy OpenGL has API (glVertexAttribPointer, perhaps others) that retains user-provided pointers, but modern OpenGL and every other API (Direct3D, Metal, Vulkan) all operate on API-allocated buffer objects that you either map into your address space or copy into synchronously. All because GPUs can't in general access system memory as efficiently (or at all).

aclements commented 3 years ago

Here's another thought, inspired by something @cherrymui said: what if we provided explicit pinning/unpinning operations, but pinned memory also stayed live. In a lot of cases you want that anyway, and it would create an incentive for users to unpin memory even with a non-moving collector.

Perhaps the hazard here is that if users pinned memory at a relatively slow rate and didn't unpin it, this would simply create a memory leak. But at some point these are all power tools we have to trust users to use correctly anyway.

cherrymui commented 3 years ago

We could probably limit the number of pinned pointers, maybe something like N+M*(number of ongoing cgo calls), and crash the program if it exceeds the limit (maybe allow user to bump up the limit).

ansiwen commented 3 years ago

what if we provided explicit pinning/unpinning operations, but pinned memory also stayed live.

@aclements I always presumed, pinning implies keeping alive. That's also how uintptrescapes works.

aclements commented 3 years ago

I always presumed, pinning implies keeping alive.

I think it's important to consider that aspect separately. For uintptrescapes, the pointer is kept live by virtue of being in a live argument on the stack. Something is actively using that pointer (to the best of our knowledge), so it really only makes sense to keep it live.

For @randall77's "pin-through" proposal, I think the same argument would apply to keeping it live while it's in use by a cgo call, but I don't think it would make sense for the act of marking something pin-through to keep it live. (Maybe I wrong, though; I haven't thought very hard about that interaction.)

For explicit pin/unpin operations, it's much less clear to me. Certainly it would be kept live during the cgo call. But I bet it often makes sense to allocate something, pin it, pass it to cgo, and then just drop it on the floor and let the GC take care of it without worrying about unpinning it. There are other mechanisms extend its life time if that's necessary (e.g., runtime.KeepAlive).

bcmills commented 3 years ago

I think pinning should imply keeping alive.

In some very deep sense, GC simulates having infinite memory. From that perspective, collecting an unreachable object is the same as relocating it to an unnameable location.

Pinning an object prevents it from being relocated at all, which should also prevent it from being relocated to the bit-bucket.

bcmills commented 3 years ago

FWIW, that's why I think the pin itself makes sense as an PointerPin object with its own lifetime, kept alive by a pending call to Unpin. An object can be relocated to the bit-bucket only when the pins attached to it can also be relocated there.

If we wanted to make it even more obvious when users have forgotten to unpin their pinned memory, we could throw an error if a PointerPin object becomes unreachable without first being unpinned.

ansiwen commented 3 years ago

I always presumed, pinning implies keeping alive.

I think it's important to consider that aspect separately. For uintptrescapes, the pointer is kept live by virtue of being in a live argument on the stack. Something is actively using that pointer (to the best of our knowledge), so it really only makes sense to keep it live.

In the case of uintptrescapes there is only a uintptr on the stack, but the GC doesn't collect the object, until the function returns, although it's unreachable. No KeepAlive() necessary. So I guess there is more involved, but I haven't checked the code.

For explicit pin/unpin operations, it's much less clear to me. Certainly it would be kept live during the cgo call. But I bet it often makes sense to allocate something, pin it, pass it to cgo, and then just drop it on the floor and let the GC take care of it without worrying about unpinning it. There are other mechanisms extend its life time if that's necessary (e.g., runtime.KeepAlive).

Ok, now I got it. So you mean, a pinning wouldn't block the GC to collect the object in case it becomes unreachable. Yeah, that can makes sense. But at least having something that I can use to Unpin() would imply that it is kept alive, wouldn't it. I think this question would only play a role, if we have something like Unpin(unsafe.Pointer) that can be used on pointers to objects that were unreachable for some time.

prattmic commented 3 years ago

FWIW, that's why I think the pin itself makes sense as an PointerPin object with its own lifetime, kept alive by a pending call to Unpin. An object can be relocated to the bit-bucket only when the pins attached to it can also be relocated there.

If we wanted to make it even more obvious when users have forgotten to unpin their pinned memory, we could throw an error if a PointerPin object becomes unreachable without first being unpinned.

I like this idea of a PointerPin object to pass to Unpin quite a bit. If nothing else, I think having this new object makes it a bit easier to remember you need to call Unpin.

That said, though I suppose it is a matter of perspective, I view such an API as not keeping pinned memory alive. The pinned object is only kept alive if the PointerPin is kept alive, which IMO is the same thing as requiring users to keep the pinned object alive, just with one extra level of indirection.

rsc commented 3 years ago

OK, so it sounds like maybe people are happy with something like

package runtime

type Pinned struct { ... }
func Pin(object interface{}) *Pinned
func (p *Pinned) Unpin()

and either Pin causes an object to stay live, or it is a crash if the garbage collector collects a pinned object (meaning an Unpin was forgotten). It seems like the former is much more helpful since you can debug it with heap profiles, etc.

Do I have that right?

ansiwen commented 3 years ago

I'm fine with both options.

For completeness I want to mention another use case that I just encountered and is not covered by the problem statement above: there are also asynchronous read and write APIs that by definition access the provided buffer after the C function returns. This is a good argument for having an explicit unpin functionality, although you could workaround an implicit scope-based unpinning with a go routine, which keeps a pinning scope alive as long as required.

dot-asm commented 3 years ago

I apologize if this is off-topic. Given bufferArray [][]byte it's not a problem to call C.func(&bufferArray[i][0], &bufferArray[j][0], &bufferArray[k][0]). Which effectively means that Go would have to commit to not moving corresponding buffers for duration of C.func call. In other words there is an implied pinning mechanism at work here. And I fail to imagine why storing these pointers in a C.struct would void it. What would be problematic is when C.func modified the pointer[s]. But then no amount of pinning would be meaningful. Indeed, if you don't make the assumption that pointer are not modified, how would pinning by itself qualify the call as safe? To summarize, it's not self-obvious that "Go memory to which it points does not contain any Go pointers" is actually about pinning. It's rather about mutability. This is not to say that explicit pinning mechanism would not become handy, but it would probably be in demand in asynchronous scenarios, as already suggested above.

As for mutability. C provides a way to formulate an immutability contract with const qualifier. And Go could use it to allow calls with Go pointer to Go pointer. (I for one would even argue that it should:-) Note that the referred C.readv does declare iov as a pointer to constant struct, which means that implementation is obliged to commit to not changing any pointers in the corresponding C.struct. And with this in mind, how would the suggested C.readv call be fundamentally different from C.func(&bufferArray[i][0],...)?

aclements commented 3 years ago

@rsc, I'd been imagining for ergonomic reasons that Pinned could pin multiple objects and (*Pinned).Unpin would unpin all of them. Also, people are likely to defer p.Unpin() and it would be much more efficient to enable a single such defer than to encourage multiple defers to unpin multiple objects, since the latter will often disable most defer optimizations.

aclements commented 3 years ago

@dot-asm, you're right that cgo already has pinning behavior. There's been some discussion of this above. It's spread across various comments, but this one is probably the most relevant.

"Go memory to which it points does not contain any Go pointers" is about both pinning and mutability. By surfacing the Go pointers clearly as cgo call arguments, the runtime has a clear place to hook automatic pinning (and unpinning). If we allow passing pointers to pointers, then the runtime may have to recursively traverse these data structures to pin all of the pointers they contain.

ianlancetaylor commented 3 years ago

@dot-asm There is a lot of background at https://go.googlesource.com/proposal/+/refs/heads/master/design/12416-cgo-pointers.md .

dot-asm commented 3 years ago

If we allow passing pointers to pointers, then the runtime may have to recursively traverse these data structures to pin all of the pointers they contain.

Here is the concern. There is unsafe interface and people shall use it each time they have a problem to solve. This is obviously suboptimal and arguably straight off unsustainable. And what you say above is that the recursion is unsustainable too. But this kind of asks for compromise, i.e. can we discuss and agree on which is less unsustainable? ;-) Or maybe you can compromise and support just one level of indirection? And specifically in a slice (as opposed to lists or something)?

Here I want to again apologize for a possible side track. Feel free to ignore, since it might be just my struggle:-) Anyway, I'd like to suggest to consider following a.go snippet

package foo

type iovec struct {
   base *byte
   len int
}

func bar(iov []iovec) {
   for i := range iov {
       *iov[i].base += 1
   }
}

and examine output from go tool compile -S a.go. We'll see that the inner loop looks as following:

        0x0009 00009 (a.go:10)  MOVQ    CX, DX
        0x000c 00012 (a.go:10)  SHLQ    $4, CX
        0x0010 00016 (a.go:10)  MOVQ    (AX)(CX*1), SI
        0x0014 00020 (a.go:10)  MOVBLZX (SI), DI
        0x0017 00023 (a.go:10)  INCL    DI
        0x0019 00025 (a.go:10)  MOVB    DIB, (SI)
        0x001c 00028 (a.go:9)   LEAQ    1(DX), CX
        0x0020 00032 (a.go:9)   CMPQ    BX, CX
        0x0023 00035 (a.go:9)   JGT     9

Essential to note that this is pretty much how the corresponding C subroutine would look like (when given &iov[0] as argument). More specifically as if no buffers are moved during its execution. But this is Go binary code, not C. In other words there are times when buffers appear pinned even to Go code(*). So that if a C call was made instead of the loop, things would just work out naturally (provided that immutability contract is honoured of course). Or is it so that C calls are not as straightforward as one would naively imagine and leave Go caller in a state that allows for the garbage collector to intervene? If so, then yes, explicit pinning would be in demand. Though at the same time one can probably argue that there is sufficient metadata available to arrange implicit one, at least in some specific cases... Or maybe one can arrange an option for application to tell runtime "treat this C call as if it's a tight loop in Go [similar to above]" so that garbage collector is held back? At least I for one would argue that it would be better option than having to resort for unsafe interface...

(*) My understanding is that this is the time prior the write-barrier thing is checked upon. But even after the barrier passed, and garbage collector is executed in parallel, it won't be free to move buffers as long as such loops are executed elsewhere, right? Is it safe to assume that movements would have to be performed during another stop-the-world?

rsc commented 3 years ago

@aclements it sounds like you are advocating for:

package runtime

type Pinner struct { ... }
func (p *Pinner) Pin(object interface{})
func (p *Pinner) Unpin()

which would get used as

var p runtime.Pinner
defer p.Unpin()
for lots of things {
    p.Pin(thing)
}

Is that right?

[Updated 10/20 - changed Pinned to Pinner.] [Updated 10/25 - changed one last Pinned to Pinner.]

aclements commented 3 years ago

@rsc exactly (maybe it should be Pinner? but whatever)

@dot-asm, I think, at a high level, it's important to recognize that the Go runtime and the compiler are in cahoots here. The generated code can look like that because the compiler knows heap objects won't move and because it generates metadata telling the runtime how to find the pointers being manipulated by that code. The GC could in fact intervene during that code snippet, but the runtime and compiler have a contract that makes that safe (for example, the GC promises not to move the stack in the middle of that snippet, though at other times it can). If the GC moved heap objects, or were generational, etc, the compiler would have to produce different code. (Regarding your footnote, it is possible to have a moving GC that does not stop the world. For example, some of the early work on this was done by the very Rick Hudson who built Go's concurrent garbage collector.)