Open Mossaka opened 1 year ago
Hi @Mossaka,
Thanks for opening this.
My personal preference is the sealed interface. I see in the original issue some concern about allocations related to passing back ints, but unless I'm mistaken, that's not an issue anymore.
Given the following:
type IsRet interface {
isRet()
}
type Int int64
func (_ Int) isRet() {}
type String string
func (_ String) isRet() {}
Creating Int's for a slice will not cause allocations for a slice like var returns []IsRet
. The interface value itself, being the 2 64bit quantities, will store the int itself and the type information. There will be allocation within the slice itself, but no per-value allocation outside of that.
I've tried reading through all the material and my preference, if it is at all possible, is in line with https://github.com/bytecodealliance/wit-bindgen/issues/612#issuecomment-1634843709 from @jordan-rash. If at all possible, we should avoid interfaces and generics and just flatten to multiple returns.
If there are cases where this is not possible, I think the sealed interface is a better solution than the generic result/option type proposed.
Looking at the proposals again, I'd agree with @johanbrandhorst that if it's possible to (mostly) neatly map them to multiple returns, that's the most idiomatic Go.
@evanphx
The interface value itself, being the 2 64bit quantities, will store the int itself and the type information.
I previously (briefly) looked for documentation on gc
's optimizations trying to figure out if it does this. Any suggestions on where to look to figure out what it will do?
We face a practical problem with result
, option
, and tuple
types: they are parameterized anonymous types with no obvious ergonomic mapping to Go's type system in all cases. For example, given the WIT interface:
interface result-cache {
// Outer result reflects whether the cache key is set
// Inner result is the actual cached result
get-result: func(key: string) -> result<result<string>>
}
We might generate some Go:
type WhatDoWeCallThis {
Ok string
Err error
}
interface ResultCache {
// We can flatten the outer `result` into an idiomatic (value, error) return
GetResult(key string) (WhatDoWeCallThis, error)
}
As described in https://github.com/bytecodealliance/wit-bindgen/issues/612#issue-1801773199, the current solutions to this problem are generics (Result[T, E]
and Option[T]
) and unpleasant generated names for tuple
s (e.g. Tuple4StringU32U64MynamspaceMypackageAV1T
).
If we want result
error types to implement error
we'll need a wrapper type in some cases, which would also be anonymous:
// Errors a la `errno`
write: func(buf: list<u8>) -> result<u64, u16>
type WhatDoWeCallThis u16
func (err WhatDoWeCallThis) Error() string { ... }
func Write(buf []byte) (uint64, error) {
...
if err != 0 {
return 0, WhatDoWeCallThis(err)
}
...
}
Given the above problems, I think generics are possibly the least-bad solution for result
s and option
s (and tuple
s, though that's a separate issue) [edit: in scenarios where we can't generate nice idiomatic Go interfaces], but I'd be happy to see better alternatives.
@evanphx
The interface value itself, being the 2 64bit quantities, will store the int itself and the type information.
I previously (briefly) looked for documentation on
gc
's optimizations trying to figure out if it does this. Any suggestions on where to look to figure out what it will do?
It's not an optimization, which is probably why you didn't see anything about it. It's a fundamental aspect of Go's memory model. This talks a bit about the the memory model of interface values: https://www.timr.co/go-interfaces-the-tricky-parts/
@lann Just not wanting to pick names for things seems like a thin reason to abandon the concept all together. The names, it turns out, rarely matter here because the users will largely be interacting with the values via local type inference: num, res := Write(buf)
.
I'm not opposed to using generics, just wanted to comment on what is today the least likely to surprise users.
Rather than talk in abstracts, I'd love to get a look at some real wit types that need generating. It seems like mapping write(buf) -> result<u64, error>
to func Write(buf []byte) (uint64, error)
would be a likely situation that wit would see, and that type of function would be one that Go users fully expect.
There will be the exceptions, no doubt, and perhaps that's where generics could be used. But I think we should have an idea of how much code we're talking about.
There will be the exceptions, no doubt, and perhaps that's where generics could be used.
Ah, apologies, this is exactly what I was suggesting; using generic forms of e.g. Result[T,E]
in edge cases like my (contrived) example -> result<result<...>>
which don't map nicely to idiomatic Go. I am very much in favor of generating idiomatic interfaces where they make sense.
Here is the current wasi-preview2 definition of (stream) read
, which conveniently includes a problematic return type with an anonymous tuple
nested in a result
: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/wit/deps/io/streams.wit#L74-L78
-> result<tuple<list<u8>, stream-status>, stream-error>
In practice I would hope we can have an implementation of io.Reader
for wasi streams, but the WIT makes for a good general example.
I briefly discussed this exact issue with @Mossaka and agreed that for any wrapping of a result, we'll need an intermediate type (e.g.option<result<...>>
which seems to be not too uncommon). My suggestion was to special case the non-wrapped result
to a (value, error)
tuple, which seems to be the consensus here too. In the wrapped case though, there still seems to be a question of whether to use a generic type or generate a named type based on context. I'm personally in favor of generating a named type and avoiding generics in the initial release. My view is that we can iterate on that by adding generics later if desired, but we can't remove the generics if we start from there. IMO generated type names are (especially for return values) not that big of a deal, as @evanphx suggests.
That is a doozy. Well, look at that file it's clear that tuple<>
is a bit part of this.
For a generic tuple, what we'd probably really like is a variadic generic, but those don't exist in Go. Which means for there should be an attempt to special case 'result<tuple<' as much as possible to create idiomatic Go, otherwise the only other option is generic struct types.
For the above read signature, I'd probably expect:
func Read() ([]u8, StreamStatus, error)
Another relevant example: https://github.com/bytecodealliance/wasmtime/blob/71d0418e4fbf203090e11f9928e36bc12a11437a/crates/wasi/wit/deps/http/types.wit#L47
new-fields: func(entries: list<tuple<string,string>>) -> fields
(note that this doesn't represent a map; http headers can have repeat keys)
I'd be interested to hear more about the desire to avoid generics. The bulk of my Go experience was pre-1.18 and I haven't really used them enough to form my own opinions...
It's a minimalism thing for me, with generics we need type Tuple1[T, ...]
etc in our runtime package, but without it we just generate the necessary types in each generated file. Generics don't provide better ergonomics than a generated type, IMO.
@johanbrandhorst type Tuple[T, ...]
isn't valid either, there are no varadic templates. So we'll end up with something like:
type Tuple1[T] struct { A T }
type Tuple2[T1, T2] struct { A T1, B T2 }
type Tuple3[T1, T2, T3] struct { A T1, B T2, C T3 }
etc. That's not awful but it's starting to border on if normal struct types should be used instead (the lack of usable field names for both types is awkward regardless).
My example was poor, I didn't mean variadic, I meant what you posted š
The discussion appears to have tapered off. Here's a summary of the points that have been addressed so far:
Result
to a (value, error)
tupleResult
type, we use named types and avoiding generics in the initial releaseresult<tuple<
Did we land on anything WRT non-wrapped Option
?? I would like to see it return a *value
if possible.
Other then that comment, I think your summary looks great to me!
Did we land on anything WRT non-wrapped Option?? I would like to see it return a *value if possible.
Ah I missed it. We didn't land it but plan to :)
For a result
type, what about using a naked any
, with an un-exported errorResult
to hold the error
value (to address the case where the stored value implements the error
interface)?
type Result any
type errorResult error
func ErrorResult(err error) Result {
return errorResult(err)
}
func Error(r Result) error {
switch t := r.(type) {
case errorResult:
return error(t)
default:
return nil
}
}
func Unwrap(r Result) (any, error) {
switch t := r.(type) {
case errorResult:
return nil, error(t)
default:
return t, nil
}
}
Interesting, it prevents users from creating their own error results, I guess? I don't love that it doesn't say anything about the use of the type itself though. Maybe Unwrap
could be a method on the result type? In any case, it isn't clear to me that we need a "general" result type with the latest proposal - we either flatten it to (resultType, error)
or wrap it in a generated type
type SomeOperationResult struct {
Result resultType
Error error
}
func FunctionReturningWrappedResult() (*SomeOperationResult, error)
Unfortunately you can't put methods on an interface receiver.
As @Mossaka pointed out, a WIT may define a function with a result as an argument. This implies the need for a Result type in places where returns couldn't be flattened.
Couldn't that equally be a generated wrapper type?
type SomeOperationParam1 struct {
Value resultType
Error error
}
func FunctionReturningWrappedResult(param SomeOperationParam1) (error)
If we're going to generate wrappers for results I'd like to understand the naming scheme for those wrappers. It seems quite easy to end up with conflicts, especially with combining two internal types.
Yep, care would have to be taken to use a deterministic and unique naming scheme. I think it's easy to come up with a trivial algorithm, could you share an example of the sort of conflict you're thinking of? My trivial algorithm would be PackageName+FunctionName+ParameterIndex
.
My trivial algorithm would be PackageName+FunctionName+ParameterIndex.
WIT result
types can appear anywhere any other type can, so we would need at least a fallback algorithm for wrappers that aren't immediately associated with a single function parameter.
The more I think about this the more appealing a single generic Result[T,E]
seems to me...
I confess my ignorance here with the complexities of WIT syntax, perhaps a generic Result
is the right way to go, but it'd be easier for me and I'm sure others if we had practical examples of the sort of definition that would be impractical using generated wrapper types. In any case, if we do need a type, a generic Result[T, E]
would be preferable to me to the suggestion made by Randy which is hard for a user to use without examples.
I guess my view is still "lets implement something and iterate on it".
I don't think anyone has enough experience writing WIT to gauge how practical particular examples might be. :slightly_smiling_face:
Here's one that feels to me like it has a plausible shape:
variant run-result {
completed(result<...>),
timed-out,
}
run: func() -> run-result
Agree with @johanbrandhorst:
I guess my view is still "lets implement something and iterate on it".
One challenge I see is that wit-bindgen
is entirely written in Rust, which presents a hurdle for Go (and not necessarily Rust) language experts from contributing. The go/types
and go/ast
packages, in particular, make generating Go with Go much easier.
Thereās two halves to this: generating glue code to the WASI ABI (//go:wasmimport
), and secondāand probably more subjectiveāgenerating ~idiomatic Go interfaces exported to other packages. My guess is makes sense to lean on standardized, performant primitives for the glue code, and experiment with different approaches for exported APIs.
For some context, @ydnar and I have created a post on BCA's zulipchat discussing the possibility of emitting JSON from the description of the WIT that could be consumed by Golang. There are some interesting discussions over there. Do you think it's worth to open another issue on this repo for introducing either a component interface or JSON that consumed by language generators written in their native languages (like wit-bindgen-go
)?
@alexcrichton
Personally I don't think there'd be an issue with something like that, and I definitely agree that writing bindings generators in the language they're generating bindings for is almost always the best way to go.
The main hurdle currently for writing a bindings generator in Go is parsing a WIT document and dependencies and such as that would require duplicating quite a lot of the Rust library wit-parser
which is not trivial. There's nothing stopping anyone doing that of course, but I'm assuming from your JSON suggestion @Mossaka you're thinking that the output of wit-parser
would be dumped into JSON so that a non-Rust bindings generator wouldn't have to reimplement all the wit-parser
bits.
If that's the case then the main thing to focus on would be a wit_parser::Resolve
. If that structure were dumped to JSON, or something else that's machine readable, then that's the "simplified input" version of a WIT package and its dependencies which a language bindings generator would use.
The easiest way to do that would be to slap #[derive(Serialize)]
on just about everything wit-parser
exports. This does mean that the output won't have documentation beyond the Rust type structure, nor would it be stable beyond the typical guarantees of Rust API stability, but it should be workable for anyone interested in bootstrapping bindings generating in a non-Rust language.
The main hurdle currently for writing a bindings generator in Go is parsing a WIT document and dependencies and such as that would require duplicating quite a lot of the Rust library wit-parser which is not trivial
If the group is interested, I actually started this effort 2 weeks ago. I was frustrated I needed to learn rust to do a simple edit to the parser. Read this as I dont have the cycles to learn rust right now, not that I think doing it in rust is the wrong answer. None the less, I haven't written a parser since university, so it was a good exercise that I intend on seeing though.
I want to keep this issue's scope to Go's ergo improvement for Result
and Option
types, so I decided to create a new issue #640 for discussing the bindgen in rust challange raised by @ydnar .
option<T>
implemented with Go generics: https://github.com/ydnar/wasm-tools-go/blob/main/cm/option.go
Example usage here: https://github.com/ydnar/wasm-tools-go/blob/94aed292cc8edc3f8e2efd189c9bbc84f6a02b86/wasi/cli/environment/environment.wit.go#L38
Considerations that drove this design:
Similarly, list<T>
here: https://github.com/ydnar/wasm-tools-go/blob/main/cm/list.go
I think it would be worthwhile to special-case certain function return type patterns, e.g.:
// opt: func() -> option<t>
func Opt() (T, bool) { ... }
// res: func() -> result<t, e>
func Res() (T, E) { ... }
...with a general-purpose representation in any other position, e.g.:
// opt-res: func() -> option<result<t, e>>
func OptRes() (Result[T, E], bool) { ... }
// res-opt: func() -> result<option<t>, e>
func ResOpt() (Option[T], E) { ... }
3. The generated code will likely be consumed by other library or package authors, where they can implement more idiomatic Go interfaces (e.g. multiple return values).
I think we should be careful about leaning too heavily on this. While it is certainly true that some interfaces will end up being wrapped by libraries, it would be unfortunate if this kind of wrapping was (essentially) mandatory, especially for "business logic" interfaces that are internal to some particular project.
In that context, I am skeptical of the linked List[T]
generic; the benefit of being able to omit the cap int
seems pretty marginal, especially as you would expect the vast majority of calling code to immediately convert to/from slices anyway.
I agree somewhat that having more idiomatic Go interfaces on the caller side of WASI imports could be beneficial to someone who chiefly writes Go code and expects interfaces to follow Go conventions, versus a Go programmer explicitly targeting WASI APIs, who might have different expectations.
Iām thinking of a couple different costs: the cost of a programmer reading WIT documentation and mentally translating to how this works in Go, and second, the runtime cost of doing the conversion. Should the caller have control, or should it be mandatory in the generated code?
One benefit of using predicable, zero or low-cost abstractions like List[T]
or Option[T]
(where possible) means that the caller can choose if and when to perform conversion.
Ultimately, I think real-world usage will help inform this either way. I figure weāll see what works and what doesnāt, and adjust accordingly.
Cross-posted from #626: we now have working Go bindings generated from WIT:
https://pkg.go.dev/github.com/ydnar/wasm-tools-go/wasi
wit-bindgen-go
can generate Go bindings from WIT:
wit-bindgen-go generate ./path/to/wasi-cli/wit # if wasm-tools is in path
Or:
wasm-tools component wit -j ./path/to/wasi-cli/wit | wit-bindgen-go generate
These are currently being used in a wasip2
port of TinyGo.
Now that we have something working, I think itās probably worth exploring whether to special-case certain patterns:
result<_, err-type>
could return a Go error
by wrapping err-type
in a generic type that implements the Go error
interface.result<ok, err>
could return (*OK, error)
in Go.option<T>
could return *T
in Go.list<T>
could return []T
.option<list<T>
, since []T
is nillable in Go.list<T>
argument, allow the exported Go function to accept []T
instead.In practice, the use of result
in record
fields or function arguments is rare, so I think it might be OK to not special-case those.
Noting some tradeoffs:
Some of these patterns will move allocations to the heap (namely returning *T
values).
Certain variant
and result
layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr
, and needing to keep track of cabi_realloc
pointers separately. Special-casing certain function signatures may provide a way to resolve this.
WIT returning option
could return *T in Go.
I would still argue for returning (T, bool)
here. It's true that *T
is a common design pattern, but it results in a lot of annoyances and/or inconsistencies when applied generically:
*int
, *string
, etc) are always annoying to work withoption<option<T>>
-> **T
Similarly [
[]T
] for option<list, since []T is nillable in Go.
This is easy to misinterpret if someone is looking only at the Go signature.
Certain variant and result layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr, and needing to keep track of cabi_realloc pointers separately. Special-casing certain function signatures may provide a way to resolve this.
Could you expand a bit on this?
WIT returning option could return *T in Go.
I would still argue for returning
(T, bool)
here. It's true that*T
is a common design pattern, but it results in a lot of annoyances and/or inconsistencies when applied generically:
Appreciate that feedback. However, when we tried the comma-ok form, it wasnāt great in practice. The Some
method to unwrap the some
value of option<T>
used to look like this:
func (o *cm.Option[T]) Some() (T, bool)
We changed it to:
func (o *cm.Option[T]) Some() *T
Which let the caller unwrap the option when passing the some
value to another function or returning it. For example:
This has the benefit of not escaping to the heap if the caller holds the Option[T]
in a local variable.
Certain variant and result layouts may not interact correctly with the Go garbage collector. The tradeoff being either (invalid) non-pointer values in pointer shapes, or pointers in uintptr, and needing to keep track of cabi_realloc pointers separately. Special-casing certain function signatures may provide a way to resolve this.
Could you expand a bit on this?
Letās say you have a result<string, error-code>
where error-code
is an enum of type uint8
.
The layout for that result
in Go is:
struct {
tag uint8
data string
}
Since the layout of string
is struct { data *byte; len uintptr }
, and the Canonical ABI specifies that variant
data storage uses the highest alignment of its associated types, then when that result
contains an error, then the error-code
value occupies the same space as the string
pointer.
If the GC needs to grow the stack, and this result
is on the stack, it could throw an error if it detects an invalid value in a place it expects a pointer. Similarly if the result
is on the heap during a GC scan.
Problem 1 can be avoided if the representation of the result
uses a different shape, e.g.:
struct {
tag uint8
data [2]uint32 // could be uintptr on wasm32
}
However, this creates a new problem: if the result
represents the OK case, then thereās a valid pointer to *byte
for the string. If there are no other references to it on the stack or heap, then the GC could collect that pointer, and the program will panic if we try to use it later.
This is fine if the caller holds onto the string (say with runtime.Keepalive
), or our cabi_realloc
function performs bookkeeping for allocations, keeping track of allocated pointers.
However, either case requires someone (either the caller, or the CM machinery) to explicitly drop a reference to any pointer used in a variant
or result
in order for the GC to collect it.
cabi_realloc
and force the caller of any Component Model function to explicitly release any pointers. This is probably the least great option.Thanks for the explanation!
Is this all scoped to just tinygo? I know tinygo's conservative GC would give a lot more flexibility in recursive stack allocations (like list<list<T>>
). It isn't clear to me that any of the optimization you are implying would be applicable to mainline Go's precise GC.
Thanks for the explanation!
Is this all scoped to just tinygo? I know tinygo's conservative GC would give a lot more flexibility in recursive stack allocations (like
list<list<T>>
). It isn't clear to me that any of the optimization you are implying would be applicable to mainline Go's precise GC.
Everything above refers to the mainline Go GC, (along with the wasm32
and wasmexport
proposals). I havenāt yet dug into how TinyGoās GC will treat variant types.
One TODO I have is to add facilities to the wit
package that can surface potential GC conflicts, e.g. by comparing the GC shapes of two WIT types. If no conflict, then it can generate a fast path.
Re: nested list<list<T>>
. Are you referring to imports or exports? I suppose if the caller has a slice backing on the stack, and escape analysis permits it, the entire nested structure could be on the stack for calling imported functions.
Tinygo's GC should be fine with your current approach, I think. Its conservative GC treats anything that looks like it could be a pointer as a potential pointer, so the worst case is that a non-pointer accidentally keeps some memory alive that should have been collected (admittedly a lot more likely in wasm...).
This issue is created for a discussion on improving the
Results
andOptions
type in the Go bindgen.To see how these types are generated, please read the head of this https://github.com/bytecodealliance/wit-bindgen/issues/612#issue-1801773199.
A summary of what's being proposed: https://github.com/bytecodealliance/wit-bindgen/issues/612#issuecomment-1644691976