WebAssembly / component-model

Repository for design and specification of the Component Model
Other
980 stars 82 forks source link

Implement resource converters #331

Open lpereira opened 7 months ago

lpereira commented 7 months ago

Hi --

I'm trying to create a resource in a component that can be constructed from values of different types. I could use a variant as an argument, but it would be nice if I could overload the constructor of a resource somehow so that idiomatic ways to construct a resource from various different types could be provided.

For instance, if we could do something like this:

resource blob {
    from(string);
    from(list<u8> as byte-array);
    from(types.descriptor);
    from(types.input-stream);
    from(types.output-stream);
    ...
}

This could generate something like this in the guest trait (or whatever the equivalent, of course):

pub trait GuestBlob: 'static {
    fn from_string(String) -> Blob;
    fn from_byte_array(Vec<u8>) -> Blob;
    fn from_descriptor(filesystem::types::Descriptor) -> Blob;
    ...
}

Then these convenience trait implementations (or whatever equivalent in another language, even if that means nothing as one could use the static methods as they are) would be present:

impl From<String> for Blob {
    fn from(other: String) -> Self {
        Self::from_string(other)
    }
}

impl From<Vec<u8>> for Blob {
    ...
}

impl From<filesystem::types::Descriptor> for Blob {
    ...
}

...

Thoughts on this?

lukewagner commented 7 months ago

Thanks for filing, this is an interesting question. So contrasting this with the existing constructor and static features, is the key additional capability you're suggesting here the ability to do type-based overloading (focused just on the use case of overloading construction)?

Thus far we've disallowed overloading because it ends up having rather varied support in different languages and it raises a bunch of thorny questions. But, just as we're considering relaxing the rules for constructor in #285 (so that constructors can return a result type), it'd be worth considering whether we should relax the rules for overloading constructors (and maybe all functions) rather than adding a separate feature to do so.

One idea is to apply the same technique as we've used for constructors, methods and static functions and make "overloading" a form of syntactic-sugar that's mangled in a spec-defined way into the real import/export name. E.g., a raw component could have:

(component
  (import "r" (type $r (sub resource)))
  (import "[constructor, overload string]r" (func (param "s" string) (result (own $r))))
  (import "[constructor, overload bytes]r" (func (param "s" (list u8)) (result (own $r))))
  ...
)

and thus semantically we have 2 different functions with 2 different names (which means a bindings generator can always fall back to mangling these 2 names to produce 2 different explicitly-callable source function declarations), but bindings could also do overloading if it made sense.

There are still a lot of tricky questions though when one considers how to make this work in a dynamic language context, where you need to derive a runtime algorithm for dynamically picking an overload. (See, e.g., what Web IDL has to do.) So maybe we'd want to further limit what overloads are allowed to rule out ambiguous cases (like Web IDL does). But that also makes this a harder design problem, which is why I've shied away from it thus far.

lpereira commented 7 months ago

Another thing that could be done instead is, for every static method in a resource whose names begin with from-, takes a single parameter of a type other than the resource that's being defined, and returns the resource that it's being defined, then, for languages where it would make sense to have automatic type conversions (e.g. From<T> trait in Rust), that would be generated in the stubs; otherwise, nothing would be generated, and people would need to call the specific static methods.

For example:

from-string: static func(str: string) -> body;

Would be such method, but these...

from-two-ints: static func(int: param1, int: param2) -> body;
from-one-int-but-returning-something-else: static func(int: param) -> something-else;

...wouldn't.

To put into perspective what I'm trying to do (XY problem!): one of the APIs I'm trying to implement has the following signature, in its original Rust:

fn insert(&self, key: String, body: impl Into<Body>) -> Result<(), ErrorCode>;

I've defined this in wit as:

insert: func(key: string, value: body) -> result<_, error-code>;

Which is almost the same thing -- however, if a resource body could be converted more idiomatically with something like:

store.insert("some key", "some value".into())?;

This would use the From<String> trait that would then call Body::from_string() to construct the body from a string, better matching the original API I'm trying to port over to wit. I can of course call it manually (Body::from_string("some value") instead) and it'll work, but it's not idiomatic.

I haven't given a whole lot of thought into this, however, so I'm sure there are drawbacks to this that we can figure out.

oovm commented 7 months ago

As far as I know, subtype relationships can be declared between resources, so a subtype constructor that never reports an error should be automatically generated.

In rust, From<Sub> should be automatically generated. However, according to the definition of subtypes, it may be necessary to generate From<Sub> functions for all descendant subtypes, and then generate impl Into<Super> where parameters are passed.

tschneidereit commented 7 months ago

Building on @oovm's proposal, one option could be to do this on the bindings layer for languages where it makes sense—and in cases where it works—based on variants: if a variant's arms are unambiguous in terms of the types they're holding, then for some languages the generated bindings could introduce overloads or other means for passing in those types instead of having to construct an instance of the variant.

Taking the example from the initial issue description, here's what that could look like:

variant blob-initializer {
  string(string),
  byte-array(list<u8>),
  input-stream(types.input-stream),
  ...
}
resource blob {
    from: static func(initializer: blob-initializer) -> blob;
}

In Rust, this could produce code along these lines (playground here):

#[allow(dead_code)]
#[derive(Debug)]
enum BlobInitializer<'a> {
  String(&'a str),
  ByteArray(&'a Vec<u8>),
  // input_stream(types.input-stream),
  // ...
}

trait IntoBlobInitializer<'a> {
    fn into(&'a self) -> BlobInitializer;
}

impl<'a> IntoBlobInitializer<'a> for &'a str {
    fn into(&'a self) -> BlobInitializer {
        BlobInitializer::String(self)
    }
}

struct Blob {}
impl<'a> Blob {
  fn from(initializer: &'a impl IntoBlobInitializer<'a>) -> Blob {
      let initializer = initializer.into();
      // Invoke `[static]blob.from(initializer)
      Blob {}
  }
}

fn main() {
    let val = &"hello world";
    let _blob = Blob::from(val);
}

I think in most languages something sensible could be done here, but at the very least it'd always be possible to fall back to requiring explicit construction of the variant itself.

To make all of this useful, it'd be required to introduce these kinds of variant's into API definitions as a pattern. That seems much nicer and lower-effort to me than introducing an actual notion of overloading. And it wouldn't depend on subtyping or any other parts of WIT that aren't yet exercised.

lpereira commented 7 months ago

I like @tschneidereit's suggestion! It better builds on top of concepts we have and feels less magical.

tschneidereit commented 6 months ago

Thinking about this some more, I'm no longer sure (ab)using variant for this makes sense. Doing so has two pretty big problems:

  1. Bindings "fall off a cliff" the moment the variant's arms are no longer unambiguous.
  2. The bindings completely ignore the arm name.

Combined, these problems mean that this special-casing would only really work if the WIT author intentionally designs the API with it in mind. But it would also "accidentally" work if they don't, potentially leading to undesirable outcomes.

Given all this, I now think that it'd be better to introduce explicit syntax for this. The two options I can think of are a union type or type enumerations—the latter of which seem preferable to me.

As a sketch, that could look like this:

resource blob {
    from: static func(initializer: string | list<u8> | input-stream) -> blob;
}

And perhaps we could support these in type aliases as well:

type blob-initializer = string | list<u8> | input-stream;

resource blob {
    from: static func(initializer: blob-initializer) -> blob;
}

With this, bindings generators should be able to produce the same kind of code I sketched out in my previous comment. And for languages where no fitting idiom exists, code generators could instead go the other way around: generate the equivalent of what they generate for variant right now, and in the above example bind blob.from such that it takes such a variant which has to be constructed explicitly.

lukewagner commented 6 months ago

We did have union (as a specialization of variants) in WIT for a while and it ran various challenges leading us to remove it in #236. That's not to say that we couldn't add it back after more careful consideration, but just that the whole topic of overloading is pretty tricky in our cross-language setting. I continue to think that we need the "base" semantics to be that there are multiple functions and then we give hints that allow language bindings, when possible, to merge functions together into overload sets, but there's always the graceful fallback to "just treat them like separate functions with different names". Also, I expect we'll get a somewhat more efficient ABI out of the multi-function approach.

Jamesernator commented 3 weeks ago

An alternative to unions/variants would just be to specify that overload resolution happens at instantiation time. This would also avoid any call time cost.

e.g. Using the blob example we could have a definition like:

(component
    (export "Blob" (type $Blob ...))
    (export "[static]Blob.from" (func.overload
        (func $Blob.fromBytes (param $bytes (array u8) (result $Blob) ...)
        (func $Blob.fromString (param $string string) (result $Blob) ...)
        ...
    )
)

An importer would be expected to define which overloads it wants to import by specifying their types:

(component
    (import "Blob" "Blob" (type $Blob))
    (import "Blob" "[static]Blob.from"
        (func.choose_overload $Blob.fromBytes (param (array u8)) (result $Blob))
        (func.choose_overload $Blob.fromString (param string) (result $Blob))
    )
)

Ideally it'd also allow upgrading to an overload so that importing a function directly selects the correct overload automatically:

(component
    (import "Blob" "Blob" (type $Blob))
    (import "Blob" "[static]Blob.from"
        ;; Equivalent to func.choose_overload
        (func $Blob.from (param (array u8)) (result $Blob))
    )
)