extism / dotnet-pdk

Extism Plug-in Development Kit (PDK) for C# and F#
https://extism.org
BSD 3-Clause "New" or "Revised" License
36 stars 6 forks source link

Use Source Generation #33

Open RichiCoder1 opened 9 months ago

RichiCoder1 commented 9 months ago

Ideally it'd be excellent to see this PDK support writing more idiomatic .NET Code via source generation.

For example, turning the Getting Started example from:

using System;
using System.Runtime.InteropServices;
using System.Text.Json;
using Extism;

namespace MyPlugin;
public class Functions
{
    public static void Main()
    {
        // Note: a `Main` method is required for the app to compile
    }

    [UnmanagedCallersOnly(EntryPoint = "greet")]
    public static int Greet()
    {
        var name = Pdk.GetInputString();
        var greeting = $"Hello, {name}!";
        Pdk.SetOutput(greeting);

        return 0;
    }
}

to

using Extism;

namespace MyPlugin;

public partial class Functions
{
    [WasmExport(Name = "greet")]
    public static string Greet(string name)
    {
        return $"Hello, {name}";
    }
}

with something like this being generated under the covers:

namespace MyPlugin;
public partial class Functions
{
    public static void Main() { }

    [UnmanagedCallersOnly(EntryPoint = "greet")]
    public static int <>__Greet0()
    {
        var name = global::Extism::Pdk.GetInputString();
        var output = Functions::Greet(name);
        global::Extism::Pdk.SetOutput(output);

        return 0;
    }
}
NickDarvey commented 9 months ago

I think this could be added with new package (like a Extism.Pdk.CSharp or Extism.Pdk.SourceGenerators) so people who want the control or to use it from F# can still use the core package.

mhmd-azeez commented 9 months ago

I agree with @NickDarvey that anything we do should take F# into account too. @RichiCoder1 do you only have strings in mind or do you also mean complex objects (structs, classes, and records)?

Also, would you like to contribute to the source generator? I'd be happy to help in designing the API and implementing

RichiCoder1 commented 9 months ago

@RichiCoder1 do you only have strings in mind or do you also mean complex objects (structs, classes, and records)?

An MVP would probably be just primitives (which I guess is just int/float/string atm), but mostly because I'm not sure what/how involved more sophisticated data would be. Reading over the Extism docs, would probably want to orient the default marshalling experience so it's consistent on both ends?

Also, would you like to contribute to the source generator? I'd be happy to help in designing the API and implementing

Sadly I don't have a lot of cycles I could commit, but happy to help evaluate and possibly assist with!

mhmd-azeez commented 8 months ago

I did some experimentation to see what's possible. Using Source Generation, a pattern like this is possible:

Use Case 1: String in, String out

public class Functions
{
    [WasmExport("greet")]
    public static string Greet(string input)
    {
        // TODO: implement greet
    }
}

Use Case 2: Complex objects

public class Functions
{
    [JsonSerialization(typeof(JsonContext))]
    [WasmExport("greet")]
    public static GreetResponse Greet(GreetRequest request)
    {
        // TODO: implement greet
    }
}

[JsonSerializable(typeof(GreetRequest))]
[JsonSerializable(typeof(GreetResponse))]
partial class JsonContext : JsonSerializerContext
{

}

public class GreetRequest
{

}

public class GreetResponse
{

}

The source generator will generate a method like this:

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static void Greet()
{
    var typeInfo = global::SampleLib.JsonContext.Default.GreetRequest;
    var json = global::Extism.Pdk.GetInput();
    var serializer = new global::Extism.JsonExtismSerializer();
    var input = serializer.Deserialize<SampleLib.GreetRequest>(json, typeInfo);

    var result = SampleLib.Functions.Greet(input);

    var typeInfo2 = global::SampleLib.JsonContext.Default.GreetResponse;
    var json2 = serializer.Serialize(result, typeInfo2);
    global::Extism.Pdk.SetOutput(json2);
}

Notes:

  1. Specifying a JsonSerializerContext is required, otherwise trimming would break the app. Without trimming the wasm file is too big (21mb)
  2. I was hoping to make the pattern extensible (so that users can use their own serialization formats: message pack, protobuf, etc). But, the fact that reflection is not supported when trimming makes this impossible. So the source generator has to be aware of all supported formats, although we can start with json.

Overall, I feel like it's too much complexity for not much value. I think having a couple of convenience methods would give us most of the value, while also being less brittle:

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static int Greet()
{
    var request = Pdk.GetInputJson<GreetRequest>(JsonContext.Default.GreetRequest);

    // TODO: implement greet

    Pdk.SetOutputJson(response, JsonContext.Default.GreetRequest);
}

Note: Extism functions can't take any explicit parameters, instead they just get a byte array as an input.

The new convenience methods are Pdk.GetInputJson<> and Pdk.SetOutputJson in this case.

Let me know what you think, I am also open for suggestions for solving the issues with the Source Generator approach.

If anyone is interested, the experiment is here: https://github.com/extism/dotnet-pdk/tree/spike/source-generation

RichiCoder1 commented 7 months ago

Howdy! Apologies for the late replay, just circling back on this.

I honestly think the Source Generate'd version looks fantastic! I do understand what you mean about it being more complicated to maintain though.

Also love the addition of the Pdk Input/Output Json helpers in the meantime.

I was hoping to make the pattern extensible (so that users can use their own serialization formats: message pack, protobuf, etc). But, the fact that reflection is not supported when trimming makes this impossible. So the source generator has to be aware of all supported formats, although we can start with json.

A solution for this would likely be a combination of ducktyping/interface and then a local or assembly attribute. So something like:

[assembly: WasmSerializer<ExtismMessagePackSerializer>]
public class Functions
{
    [WasmExport("greet")]
    public static GreetResponse Greet(GreetRequest request)
    {
        // TODO: implement greet
    }
}

would end up being resolved into

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static void Greet()
{
    var input = global::Extism.Pdk.GetInput();
    var input = global::ThirdParty.ExtismMessagePackSerializer.Deserialize<SampleLib.GreetRequest>(input);

    var result = SampleLib.Functions.Greet(input);

    var output = global::ThirdParty.ExtismMessagePackSerializer.Serialize<SampleLib.GreetRequest>(result);
    global::Extism.Pdk.SetOutput(output);
}

The third party (or first party) serializer in this case would just use duck typing, but you could also provide an interface like so if you want people to have something to implement and check against:

public interface IExtismSerializer
{
    static abstract T Deserialize<T>(byte[] input);
    static abstract byte[] Serialize<T>(T input);
}

An example for message pack would be:

// SampleProj/dto.cs

[MessagePackObject]
public class GreetRequest
{

}
[MessagePackObject]
public class GreetResponse
{

}
// ThirdParty

namespace ThirdParty;

public class ExtismMessagePackSerializer : IExtismSerializer
{
    public static byte[] Serialize<T>(T input)
    {
        return MessagePackSerializer.Serialize(input);
    }

    public static T Deserialize<T>(bytes[] input)
    {
        return MessagePackSerializer.Deserialize<T>(input);
    }
}

(in this case, you could just MessagePackSerializer directly too, but that would not be that case for all serializers I reviewed)

An if you need to specify options or settings, you'd end up just dropping back to bytes[].

In that case, it would even be neat to have a "default" signature like so:

public class Functions
{
    [WasmExport("raw")]
    public static byte[] Raw(bytes[] data)
    {
        // TODO: implement own seralization and de-serialization logic
    }
}

Which basically does the Pdk'ing for you. It's a small thing, but a little bit of magic can go a long way IMHO.

mhmd-azeez commented 7 months ago

@RichiCoder1 thanks for the detailed write up, the only issue here is that you still need Reflection for this method to work (and reflection is not supported when using AOT compilation with wasm). The official MessagePack serializer uses Reflection, unless you use their Source Generator. Which causes the same problem as the System.Text.Json Serializer.

I am not sure if having to write this:

[JsonSerializable(typeof(GreetInput))]
public partial class SourceGenerationContext : JsonSerializerContext {}
public record GreetInput(int a, int b);

[UnmanagedCallersOnly]
public static int greet()
{
     var input = Pdk.GetInputJson(SourceGenerationContext.Default.GreetInput);
     // TODO: implement greet
}

Is much different from having to write this:

[JsonSerializable(typeof(GreetInput))] // We need this to get the JsonTypeInfo for the input/output type
public partial class SourceGenerationContext : JsonSerializerContext {}
public record GreetInput(int a, int b);

[JsonSerialization(typeof(SourceGenerationContext))]
[WasmExport("greet")]
public static GreetResponse Greet(GreetRequest request)
{
    // TODO: implement greet
}

My issue with the source generator approach is:

  1. It's too brittle. If you forget one of the attributes, suddenly your wasm is not exporting the function, or there is weird compilation errors, I am worried that instead of simplifying things, it just makes things more complicated and harder to understand for newcomers.
  2. In the end, it's not that much shorter. We can write convenience methods for other serialization protocols (perhaps in their own nuget packages) similar to Json.
RichiCoder1 commented 7 months ago

the only issue here is that you still need Reflection for this method to work

That's fair, and that's going to be true of any serialization method just due to WASM Source gen, no? Ultimately that comes down to documentation and also is one of the reasons I'd say you shouldn't necessarily stick multiple formats in core.

In the end, it's not that much shorter.

The difference imho is that it's trivial extra lines for one or two functions, but that starts to add for every function after that much faster for the "add another method" scheme. As soon as you're implementing a lot of members or a complicated interfaces, it starts looking a lot more intensive.

It's too brittle. If you forget one of the attributes, suddenly your wasm is not exporting the function, or there is weird compilation errors, I am worried that instead of simplifying things, it just makes things more complicated and harder to understand for newcomers.

Part of this can be mitigated via Analyzers, but also part of this is you'll get the exact same errors if you do it manually as if you do it via source gen. And a benefit of source gen is what's getting gen'd is pretty transparent in the end.

I'm also thinking of cases too where a vendor might want to provide, say, a PDK on top of Extism's PDK to C# developers to encourage them to write plugins. The default Extism experience is familiar for doing cross-lang stuff, but very alien for C# developers. Exposing source gen primitives allows downstream devs to expose a more "native" looking experience with much fewer caveats. Though I suppose a good argument against that also is the vendor could simply cover these warts themselves (hide all the interop in an abstraction layer).

That being said, source gens aren't simple to maintain so I can see that argument that it's too much for a small theoretical win.

mhmd-azeez commented 7 months ago

That's fair, and that's going to be true of any serialization method just due to WASM Source gen, no

Exactly, that's why you can't automate the serialization part. The way you have shown will break when you run dotnet publish if you enable PublishTrimmed. And PublishTrimmed is crucial because it decreases the size of the binary from 20mb down to about 7mb (which still pretty big, not much better).

The default Extism experience is familiar for doing cross-lang stuff, but very alien for C# developers

I agree, I'd love to be able to make it nicer

Overall, I am inclined towards waiting for more user feedback, hopefully we can get to a good solution down the line

furesoft commented 6 months ago

I have some ideas how the reflection problem could be solved when using other serializers. I will try it this weekend and make a PR if it works as intended.

Simonl9l commented 5 months ago

I'd suggest that per "brittle-ness", the generation only covers half the possible solution space, if there are expectation that certain things exist, this is covered more in the custom code analyzer

I'd also suggest that the PDK on a PDK situation is also very likely. We're directly looking at that scenario, as expecting a plugin developer for a specific solution to have to add a bunch of boiler plate to define say the host functions etc. in support of such a solution would be developer friction. Having the .net PDK leverage generators and analyzers is then reflective of that same situation., with us being exitsm developers, but not having to expose extism to our plugin developers explciity - more "powered by".

nilslice commented 5 months ago

Hi @Simonl9l - it sounds like perhaps you could benefit from something new we're working on. Especially given that PDK code generation and reduction of boilerplate is a first-class part of its DX.

Would you be open to setting up a call, where we could chat about this as well as more on your use case to see how we can best assist?

cc/ @bhelx

bhelx commented 3 months ago

Seems like it's been a while since we had activity on this thread but there are still some things to be solved so I'm going to keep this open. Please chime in if anything has changed!

Simonl9l commented 3 weeks ago

@bhelx & @nilslice back from being deep in our own project, and looping back to Extism. If you are still interested connecting please advise.

nilslice commented 3 weeks ago

Would be happy to! Feel free to send me an email, steve@dylibso.com, or reach out via discord: https://extism.org/discord