bytecodealliance / wit-bindgen

A language binding generator for WebAssembly interface types
Apache License 2.0
1.03k stars 194 forks source link

Go: ergonomics improvement for `Option` and `Result` type #625

Open Mossaka opened 1 year ago

Mossaka commented 1 year ago

This issue is created for a discussion on improving the Results and Options type in the Go bindgen.

To see how these types are generated, please read the head of this https://github.com/bytecodealliance/wit-bindgen/issues/612#issue-1801773199.

A summary of what's being proposed: https://github.com/bytecodealliance/wit-bindgen/issues/612#issuecomment-1644691976

evanphx commented 1 year ago

Hi @Mossaka,

Thanks for opening this.

My personal preference is the sealed interface. I see in the original issue some concern about allocations related to passing back ints, but unless I'm mistaken, that's not an issue anymore.

Given the following:

type IsRet interface {
   isRet()
}

type Int int64

func (_ Int) isRet() {}

type String string

func (_ String) isRet() {}

Creating Int's for a slice will not cause allocations for a slice like var returns []IsRet. The interface value itself, being the 2 64bit quantities, will store the int itself and the type information. There will be allocation within the slice itself, but no per-value allocation outside of that.

johanbrandhorst commented 1 year ago

I've tried reading through all the material and my preference, if it is at all possible, is in line with https://github.com/bytecodealliance/wit-bindgen/issues/612#issuecomment-1634843709 from @jordan-rash. If at all possible, we should avoid interfaces and generics and just flatten to multiple returns.

If there are cases where this is not possible, I think the sealed interface is a better solution than the generic result/option type proposed.

evanphx commented 1 year ago

Looking at the proposals again, I'd agree with @johanbrandhorst that if it's possible to (mostly) neatly map them to multiple returns, that's the most idiomatic Go.

lann commented 1 year ago

@evanphx

The interface value itself, being the 2 64bit quantities, will store the int itself and the type information.

I previously (briefly) looked for documentation on gc's optimizations trying to figure out if it does this. Any suggestions on where to look to figure out what it will do?

lann commented 1 year ago

We face a practical problem with result, option, and tuple types: they are parameterized anonymous types with no obvious ergonomic mapping to Go's type system in all cases. For example, given the WIT interface:

interface result-cache {
  // Outer result reflects whether the cache key is set
  // Inner result is the actual cached result
  get-result: func(key: string) -> result<result<string>>
}

We might generate some Go:

type WhatDoWeCallThis {
    Ok string
    Err error
}

interface ResultCache {
    // We can flatten the outer `result` into an idiomatic (value, error) return
    GetResult(key string) (WhatDoWeCallThis, error)
}

As described in https://github.com/bytecodealliance/wit-bindgen/issues/612#issue-1801773199, the current solutions to this problem are generics (Result[T, E] and Option[T]) and unpleasant generated names for tuples (e.g. Tuple4StringU32U64MynamspaceMypackageAV1T).

lann commented 1 year ago

If we want result error types to implement error we'll need a wrapper type in some cases, which would also be anonymous:

// Errors a la `errno`
write: func(buf: list<u8>) -> result<u64, u16>
type WhatDoWeCallThis u16

func (err WhatDoWeCallThis) Error() string { ... }

func Write(buf []byte) (uint64, error) {
    ...
    if err != 0 {
        return 0, WhatDoWeCallThis(err)
    }
    ...
}
lann commented 1 year ago

Given the above problems, I think generics are possibly the least-bad solution for results and options (and tuples, though that's a separate issue) [edit: in scenarios where we can't generate nice idiomatic Go interfaces], but I'd be happy to see better alternatives.

evanphx commented 1 year ago

@evanphx

The interface value itself, being the 2 64bit quantities, will store the int itself and the type information.

I previously (briefly) looked for documentation on gc's optimizations trying to figure out if it does this. Any suggestions on where to look to figure out what it will do?

It's not an optimization, which is probably why you didn't see anything about it. It's a fundamental aspect of Go's memory model. This talks a bit about the the memory model of interface values: https://www.timr.co/go-interfaces-the-tricky-parts/

evanphx commented 1 year ago

@lann Just not wanting to pick names for things seems like a thin reason to abandon the concept all together. The names, it turns out, rarely matter here because the users will largely be interacting with the values via local type inference: num, res := Write(buf).

I'm not opposed to using generics, just wanted to comment on what is today the least likely to surprise users.

Rather than talk in abstracts, I'd love to get a look at some real wit types that need generating. It seems like mapping write(buf) -> result<u64, error> to func Write(buf []byte) (uint64, error) would be a likely situation that wit would see, and that type of function would be one that Go users fully expect.

There will be the exceptions, no doubt, and perhaps that's where generics could be used. But I think we should have an idea of how much code we're talking about.

lann commented 1 year ago

There will be the exceptions, no doubt, and perhaps that's where generics could be used.

Ah, apologies, this is exactly what I was suggesting; using generic forms of e.g. Result[T,E] in edge cases like my (contrived) example -> result<result<...>> which don't map nicely to idiomatic Go. I am very much in favor of generating idiomatic interfaces where they make sense.

lann commented 1 year ago

Here is the current wasi-preview2 definition of (stream) read, which conveniently includes a problematic return type with an anonymous tuple nested in a result: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/wit/deps/io/streams.wit#L74-L78

-> result<tuple<list<u8>, stream-status>, stream-error>

In practice I would hope we can have an implementation of io.Reader for wasi streams, but the WIT makes for a good general example.

johanbrandhorst commented 1 year ago

I briefly discussed this exact issue with @Mossaka and agreed that for any wrapping of a result, we'll need an intermediate type (e.g.option<result<...>> which seems to be not too uncommon). My suggestion was to special case the non-wrapped result to a (value, error) tuple, which seems to be the consensus here too. In the wrapped case though, there still seems to be a question of whether to use a generic type or generate a named type based on context. I'm personally in favor of generating a named type and avoiding generics in the initial release. My view is that we can iterate on that by adding generics later if desired, but we can't remove the generics if we start from there. IMO generated type names are (especially for return values) not that big of a deal, as @evanphx suggests.

evanphx commented 1 year ago

That is a doozy. Well, look at that file it's clear that tuple<> is a bit part of this.

For a generic tuple, what we'd probably really like is a variadic generic, but those don't exist in Go. Which means for there should be an attempt to special case 'result<tuple<' as much as possible to create idiomatic Go, otherwise the only other option is generic struct types.

For the above read signature, I'd probably expect:

func Read() ([]u8, StreamStatus, error)
lann commented 1 year ago

Another relevant example: https://github.com/bytecodealliance/wasmtime/blob/71d0418e4fbf203090e11f9928e36bc12a11437a/crates/wasi/wit/deps/http/types.wit#L47

new-fields: func(entries: list<tuple<string,string>>) -> fields

(note that this doesn't represent a map; http headers can have repeat keys)

lann commented 1 year ago

I'd be interested to hear more about the desire to avoid generics. The bulk of my Go experience was pre-1.18 and I haven't really used them enough to form my own opinions...

johanbrandhorst commented 1 year ago

It's a minimalism thing for me, with generics we need type Tuple1[T, ...] etc in our runtime package, but without it we just generate the necessary types in each generated file. Generics don't provide better ergonomics than a generated type, IMO.

evanphx commented 1 year ago

@johanbrandhorst type Tuple[T, ...] isn't valid either, there are no varadic templates. So we'll end up with something like:

type Tuple1[T] struct { A T }
type Tuple2[T1, T2] struct { A T1, B T2 }
type Tuple3[T1, T2, T3] struct { A T1, B T2, C T3 }

etc. That's not awful but it's starting to border on if normal struct types should be used instead (the lack of usable field names for both types is awkward regardless).

johanbrandhorst commented 1 year ago

My example was poor, I didn't mean variadic, I meant what you posted šŸ˜

Mossaka commented 1 year ago

The discussion appears to have tapered off. Here's a summary of the points that have been addressed so far:

  1. Special handling for the non-wrapped Result to a (value, error) tuple
  2. For wrapped Result type, we use named types and avoiding generics in the initial release
  3. We want to think about special hanlding for more common wrapped types like result<tuple<
jordan-rash commented 1 year ago

Did we land on anything WRT non-wrapped Option?? I would like to see it return a *value if possible.

Other then that comment, I think your summary looks great to me!

Mossaka commented 1 year ago

Did we land on anything WRT non-wrapped Option?? I would like to see it return a *value if possible.

Ah I missed it. We didn't land it but plan to :)

ydnar commented 1 year ago

For a result type, what about using a naked any, with an un-exported errorResult to hold the error value (to address the case where the stored value implements the error interface)?

type Result any

type errorResult error

func ErrorResult(err error) Result {
    return errorResult(err)
}

func Error(r Result) error {
    switch t := r.(type) {
    case errorResult:
        return error(t)
    default:
        return nil
    }
}

func Unwrap(r Result) (any, error) {
    switch t := r.(type) {
    case errorResult:
        return nil, error(t)
    default:
        return t, nil
    }
}
johanbrandhorst commented 1 year ago

Interesting, it prevents users from creating their own error results, I guess? I don't love that it doesn't say anything about the use of the type itself though. Maybe Unwrap could be a method on the result type? In any case, it isn't clear to me that we need a "general" result type with the latest proposal - we either flatten it to (resultType, error) or wrap it in a generated type

type SomeOperationResult struct {
    Result resultType
    Error error
} 

func FunctionReturningWrappedResult() (*SomeOperationResult, error)
ydnar commented 1 year ago

Unfortunately you can't put methods on an interface receiver.

As @Mossaka pointed out, a WIT may define a function with a result as an argument. This implies the need for a Result type in places where returns couldn't be flattened.

johanbrandhorst commented 1 year ago

Couldn't that equally be a generated wrapper type?

type SomeOperationParam1 struct {
    Value resultType
    Error error
} 

func FunctionReturningWrappedResult(param SomeOperationParam1) (error)
lann commented 1 year ago

If we're going to generate wrappers for results I'd like to understand the naming scheme for those wrappers. It seems quite easy to end up with conflicts, especially with combining two internal types.

johanbrandhorst commented 1 year ago

Yep, care would have to be taken to use a deterministic and unique naming scheme. I think it's easy to come up with a trivial algorithm, could you share an example of the sort of conflict you're thinking of? My trivial algorithm would be PackageName+FunctionName+ParameterIndex.

lann commented 1 year ago

My trivial algorithm would be PackageName+FunctionName+ParameterIndex.

WIT result types can appear anywhere any other type can, so we would need at least a fallback algorithm for wrappers that aren't immediately associated with a single function parameter.

The more I think about this the more appealing a single generic Result[T,E] seems to me...

johanbrandhorst commented 1 year ago

I confess my ignorance here with the complexities of WIT syntax, perhaps a generic Result is the right way to go, but it'd be easier for me and I'm sure others if we had practical examples of the sort of definition that would be impractical using generated wrapper types. In any case, if we do need a type, a generic Result[T, E] would be preferable to me to the suggestion made by Randy which is hard for a user to use without examples.

I guess my view is still "lets implement something and iterate on it".

lann commented 1 year ago

I don't think anyone has enough experience writing WIT to gauge how practical particular examples might be. :slightly_smiling_face:

Here's one that feels to me like it has a plausible shape:

variant run-result {
  completed(result<...>),
  timed-out,
}
run: func() -> run-result
ydnar commented 1 year ago

Agree with @johanbrandhorst:

I guess my view is still "lets implement something and iterate on it".

One challenge I see is that wit-bindgen is entirely written in Rust, which presents a hurdle for Go (and not necessarily Rust) language experts from contributing. The go/types and go/ast packages, in particular, make generating Go with Go much easier.

Thereā€™s two halves to this: generating glue code to the WASI ABI (//go:wasmimport), and secondā€”and probably more subjectiveā€”generating ~idiomatic Go interfaces exported to other packages. My guess is makes sense to lean on standardized, performant primitives for the glue code, and experiment with different approaches for exported APIs.

Mossaka commented 1 year ago

For some context, @ydnar and I have created a post on BCA's zulipchat discussing the possibility of emitting JSON from the description of the WIT that could be consumed by Golang. There are some interesting discussions over there. Do you think it's worth to open another issue on this repo for introducing either a component interface or JSON that consumed by language generators written in their native languages (like wit-bindgen-go)?

@alexcrichton

alexcrichton commented 1 year ago

Personally I don't think there'd be an issue with something like that, and I definitely agree that writing bindings generators in the language they're generating bindings for is almost always the best way to go.

The main hurdle currently for writing a bindings generator in Go is parsing a WIT document and dependencies and such as that would require duplicating quite a lot of the Rust library wit-parser which is not trivial. There's nothing stopping anyone doing that of course, but I'm assuming from your JSON suggestion @Mossaka you're thinking that the output of wit-parser would be dumped into JSON so that a non-Rust bindings generator wouldn't have to reimplement all the wit-parser bits.

If that's the case then the main thing to focus on would be a wit_parser::Resolve. If that structure were dumped to JSON, or something else that's machine readable, then that's the "simplified input" version of a WIT package and its dependencies which a language bindings generator would use.

The easiest way to do that would be to slap #[derive(Serialize)] on just about everything wit-parser exports. This does mean that the output won't have documentation beyond the Rust type structure, nor would it be stable beyond the typical guarantees of Rust API stability, but it should be workable for anyone interested in bootstrapping bindings generating in a non-Rust language.

jordan-rash commented 1 year ago

The main hurdle currently for writing a bindings generator in Go is parsing a WIT document and dependencies and such as that would require duplicating quite a lot of the Rust library wit-parser which is not trivial

If the group is interested, I actually started this effort 2 weeks ago. I was frustrated I needed to learn rust to do a simple edit to the parser. Read this as I dont have the cycles to learn rust right now, not that I think doing it in rust is the wrong answer. None the less, I haven't written a parser since university, so it was a good exercise that I intend on seeing though.

https://github.com/jordan-rash/go-wit

Mossaka commented 1 year ago

I want to keep this issue's scope to Go's ergo improvement for Result and Option types, so I decided to create a new issue #640 for discussing the bindgen in rust challange raised by @ydnar .

ydnar commented 9 months ago

option<T> implemented with Go generics: https://github.com/ydnar/wasm-tools-go/blob/main/cm/option.go

Example usage here: https://github.com/ydnar/wasm-tools-go/blob/94aed292cc8edc3f8e2efd189c9bbc84f6a02b86/wasi/cli/environment/environment.wit.go#L38

Considerations that drove this design:

  1. Binary compatibility with the Component Model Canonical ABI
  2. Generated Go types and function signatures match the source WIT definitions wherever possible.
  3. The generated code will likely be consumed by other library or package authors, where they can implement more idiomatic Go interfaces (e.g. multiple return values).
ydnar commented 9 months ago

Similarly, list<T> here: https://github.com/ydnar/wasm-tools-go/blob/main/cm/list.go

lann commented 9 months ago

I think it would be worthwhile to special-case certain function return type patterns, e.g.:

// opt: func() -> option<t>
func Opt() (T, bool) { ... }

// res: func() -> result<t, e>
func Res() (T, E) { ... }

...with a general-purpose representation in any other position, e.g.:

// opt-res: func() -> option<result<t, e>>
func OptRes() (Result[T, E], bool) { ... }

// res-opt: func() -> result<option<t>, e>
func ResOpt() (Option[T], E) { ... }
lann commented 9 months ago
3. The generated code will likely be consumed by other library or package authors, where they can implement more idiomatic Go interfaces (e.g. multiple return values).

I think we should be careful about leaning too heavily on this. While it is certainly true that some interfaces will end up being wrapped by libraries, it would be unfortunate if this kind of wrapping was (essentially) mandatory, especially for "business logic" interfaces that are internal to some particular project.

In that context, I am skeptical of the linked List[T] generic; the benefit of being able to omit the cap int seems pretty marginal, especially as you would expect the vast majority of calling code to immediately convert to/from slices anyway.

ydnar commented 9 months ago

I agree somewhat that having more idiomatic Go interfaces on the caller side of WASI imports could be beneficial to someone who chiefly writes Go code and expects interfaces to follow Go conventions, versus a Go programmer explicitly targeting WASI APIs, who might have different expectations.

Iā€™m thinking of a couple different costs: the cost of a programmer reading WIT documentation and mentally translating to how this works in Go, and second, the runtime cost of doing the conversion. Should the caller have control, or should it be mandatory in the generated code?

One benefit of using predicable, zero or low-cost abstractions like List[T] or Option[T] (where possible) means that the caller can choose if and when to perform conversion.

Ultimately, I think real-world usage will help inform this either way. I figure weā€™ll see what works and what doesnā€™t, and adjust accordingly.

ydnar commented 8 months ago

Cross-posted from #626: we now have working Go bindings generated from WIT:

https://pkg.go.dev/github.com/ydnar/wasm-tools-go/wasi

wit-bindgen-go can generate Go bindings from WIT:

wit-bindgen-go generate ./path/to/wasi-cli/wit # if wasm-tools is in path

Or:

wasm-tools component wit -j ./path/to/wasi-cli/wit | wit-bindgen-go generate

These are currently being used in a wasip2 port of TinyGo.

ydnar commented 8 months ago

Now that we have something working, I think itā€™s probably worth exploring whether to special-case certain patterns:

  1. A WIT function that returns result<_, err-type> could return a Go error by wrapping err-type in a generic type that implements the Go error interface.
  2. Similarly for result<ok, err> could return (*OK, error) in Go.
  3. WIT returning option<T> could return *T in Go.
  4. WIT returning list<T> could return []T.
  5. Similarly for option<list<T>, since []T is nillable in Go.
  6. If a generated function accepts a list<T> argument, allow the exported Go function to accept []T instead.

In practice, the use of result in record fields or function arguments is rare, so I think it might be OK to not special-case those.

Noting some tradeoffs:

Some of these patterns will move allocations to the heap (namely returning *T values).

Certain variant and result layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr, and needing to keep track of cabi_realloc pointers separately. Special-casing certain function signatures may provide a way to resolve this.

lann commented 8 months ago

WIT returning option could return *T in Go.

I would still argue for returning (T, bool) here. It's true that *T is a common design pattern, but it results in a lot of annoyances and/or inconsistencies when applied generically:

Similarly [[]T] for option<list, since []T is nillable in Go.

This is easy to misinterpret if someone is looking only at the Go signature.

lann commented 8 months ago

Certain variant and result layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr, and needing to keep track of cabi_realloc pointers separately. Special-casing certain function signatures may provide a way to resolve this.

Could you expand a bit on this?

ydnar commented 8 months ago

WIT returning option could return *T in Go.

I would still argue for returning (T, bool) here. It's true that *T is a common design pattern, but it results in a lot of annoyances and/or inconsistencies when applied generically:

Appreciate that feedback. However, when we tried the comma-ok form, it wasnā€™t great in practice. The Some method to unwrap the some value of option<T> used to look like this:

func (o *cm.Option[T]) Some() (T, bool)

We changed it to:

func (o *cm.Option[T]) Some() *T

Which let the caller unwrap the option when passing the some value to another function or returning it. For example:

https://github.com/dgryski/tinygo/commit/778eebb35ae600e66870a14d04e0e67f6e0380b0#diff-519198709ce5f198021f73cc0136481092b1104ed96e2f21c3bbbd0ffe819d9dL509-R515

This has the benefit of not escaping to the heap if the caller holds the Option[T] in a local variable.

ydnar commented 8 months ago

Certain variant and result layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr, and needing to keep track of cabi_realloc pointers separately. Special-casing certain function signatures may provide a way to resolve this.

Could you expand a bit on this?

Letā€™s say you have a result<string, error-code> where error-code is an enum of type uint8.

The layout for that result in Go is:

struct {
    tag uint8
    data string
}

Problem 1: invalid pointers

Since the layout of string is struct { data *byte; len uintptr }, and the Canonical ABI specifies that variant data storage uses the highest alignment of its associated types, then when that result contains an error, then the error-code value occupies the same space as the string pointer.

If the GC needs to grow the stack, and this result is on the stack, it could throw an error if it detects an invalid value in a place it expects a pointer. Similarly if the result is on the heap during a GC scan.

Problem 2: untracked pointers

Problem 1 can be avoided if the representation of the result uses a different shape, e.g.:

struct {
    tag uint8
    data [2]uint32 // could be uintptr on wasm32
}

However, this creates a new problem: if the result represents the OK case, then thereā€™s a valid pointer to *byte for the string. If there are no other references to it on the stack or heap, then the GC could collect that pointer, and the program will panic if we try to use it later.

This is fine if the caller holds onto the string (say with runtime.Keepalive), or our cabi_realloc function performs bookkeeping for allocations, keeping track of allocated pointers.

However, either case requires someone (either the caller, or the CM machinery) to explicitly drop a reference to any pointer used in a variant or result in order for the GC to collect it.

Solutions

  1. In theory, Goā€™s garbage collector could be changed to ignore certain invalid pointers on the stack or heap, which would enable tagged unions in Go which share storage for types with different GC shapes.
  2. Another possible solution is to have the generated bindings return "fat" types that contain sidecar values holding all pointer(s) that need to be kept alive.
  3. Force every generated CM function to convert Go values to their Canonical ABI equivalent, and hide pointer bookkeeping inside of generated code. This would introduce copying overhead for any CM function call.
  4. Introduce bookkeeping for every pointer generated by cabi_realloc and force the caller of any Component Model function to explicitly release any pointers. This is probably the least great option.
lann commented 8 months ago

Thanks for the explanation!

Is this all scoped to just tinygo? I know tinygo's conservative GC would give a lot more flexibility in recursive stack allocations (like list<list<T>>). It isn't clear to me that any of the optimization you are implying would be applicable to mainline Go's precise GC.

ydnar commented 8 months ago

Thanks for the explanation!

Is this all scoped to just tinygo? I know tinygo's conservative GC would give a lot more flexibility in recursive stack allocations (like list<list<T>>). It isn't clear to me that any of the optimization you are implying would be applicable to mainline Go's precise GC.

Everything above refers to the mainline Go GC, (along with the wasm32 and wasmexport proposals). I havenā€™t yet dug into how TinyGoā€™s GC will treat variant types.

One TODO I have is to add facilities to the wit package that can surface potential GC conflicts, e.g. by comparing the GC shapes of two WIT types. If no conflict, then it can generate a fast path.

Re: nested list<list<T>>. Are you referring to imports or exports? I suppose if the caller has a slice backing on the stack, and escape analysis permits it, the entire nested structure could be on the stack for calling imported functions.

lann commented 8 months ago

Tinygo's GC should be fine with your current approach, I think. Its conservative GC treats anything that looks like it could be a pointer as a potential pointer, so the worst case is that a non-pointer accidentally keeps some memory alive that should have been collected (admittedly a lot more likely in wasm...).