fermyon / spin

Spin is the open source developer tool for building and running serverless applications powered by WebAssembly.
https://developer.fermyon.com/spin
Apache License 2.0
5.06k stars 242 forks source link

Feature: CloudEvents component #394

Open Mossaka opened 2 years ago

Mossaka commented 2 years ago

Enabling CloudEvents component could make spin a great solution to serverless and event-driven applications. I am thinking about an event trigger that subscribes to a addressible broker, and whenever the broker emits an event, the event trigger will instantiate a wasm module to hanlde. Or the trigger itself can emits an event.

For example the trigger can subscribe to a knative event source.

I am proposing the following component sdk. Let me know your thoughts. @radu-matei

use anyhow::Result;
use spin_sdk::{
    event::{Event},
    event_component,
};

/// A simple Spin event component.
#[event_component]
fn trigger(event: Event) -> Result<()> {
    println!("{}", event);
    Ok(())
}
radu-matei commented 2 years ago

@Mossaka this looks like a great idea!

For larger work items, we try to write a Spin Improvement Proposal (SIP) - see a few examples.

radu-matei commented 2 years ago

Thanks a lot for yesterday's presentation, @Mossaka!

ref #398

After going through the CloudEvents core spec, the HTTP webhook spec, and the protocol bindings, here is roughly how I am thinking about implementing support for CloudEvents in Spin.

Responding to HTTP webhooks

The CloudEvents HTTP webhook specification describes the pattern to deliver notifications through HTTP endpoints.

In this scenario, the producer sends a valid HTTP request with a CloudEvents payload (potentially with support for the validation and abuse protection protocol), and expects a valid HTTP response, with an optional CloudEvents payload.

In handling the incoming request, a handler component can use external services (and potentially send new CloudEvents payloads), but the interaction is limited to the request-response model of HTTP/1.1. Because of that, it seems like this can be implemented with the current HTTP trigger — the incoming webhook is an HTTP request, with a CloudEvent payload, and the response is an HTTP response. If we add support in the Spin SDKs for easily deserializing Event objects from HTTP requests and serializing Event objects as HTTP responses, we can write something similar to:

#[http_component]
pub fn handle_ce(req: Request) -> Result<Response> {

    // perform the spec endpoint validation for abuse protection
    // https://github.com/cloudevents/spec/blob/main/cloudevents/http-webhook.md#4-abuse-protection
    match req.method {
        http::Method::OPTIONS => return spin_sdk::validate_webhook(req),
        _ => {}
    };

    // read the CloudEvents request event from the HTTP request
    let request_event = Event::try_from(req)?;

    // create a CloudEvents response event
    let response_event = Event::new(...);
    ...
    // potentially use other external services here
    ...

    // return the HTTP response
    Ok(http::Response::builder()
        .status(200)
        // optionally add other headers
        .body(Some(response_event.into()))?)
}

How would we feel about something like the component above?

This way, we would follow the HTTP webhook specification (returning an HTTP response with an optional Event response), and reuse the current Spin HTTP infrastructure for everything (executor, component definition, templates, SDK)

A CloudEvents subscription manager

The above implementation for HTTP webhooks is great, and it requires minimal changes to Spin. However, it is only a small subset of the potential CloudEvents integration into Spin.

Of particular importance could be the Subscriptions API.

Specifically, Spin could act as a "subscription manager" and manage event subscriptions on behalf of "event consumers" (Spin components defined in the current application).

In this scenario, we would build a specific CloudEvents executor:

// The entry point for a CloudEvents handler.
handle-cloudevent: function(event: event) -> expected<event, error>

This executor would be entirely independent of a specific Spin trigger. Rather, for Spin triggers that are compatible with the CloudEvents transport protocols (i.e. HTTP, NATS, or Kafka), we would be able to adapt the incoming request and use the CloudEvents executor as a new executor, besides the existing executors (for the HTTP trigger, there would be a Spin executor, a Wagi executor, and a CloudEvents executor — details on how this differs from the CloudEvents webhook endpoint in the next section).

This is a CloudEvents Subscription:

{
  "id": "[a subscription manager scoped unique string]",
  "source": "[...]", ?
  "types": "[ "[ce-type values]" + ]", ?
  "config": { ?
    "[key]": [subscription manager specific value], *
  },

  "filters": [ ?
    { "[dialect name]": [dialect specific object] } +
  ],

  "sink": "[URI to where events are delivered]",
  "protocol": "[delivery protocol]",
  "protocolsettings": { ?
    "[key]": "[type]", *
  }
}

The sink field is how the Subscription (and this proposed Spin executor) would be different compared to the simple webhook scenario — sink defines the address to which the incoming events MUST be sent. This can be any protocol that the implementation supports, and the event must be sent without client code manually performing the request.

In the course of implementing such a component, developers could manually send requests or events to other locations, but to properly implement the Subscription API, Spin must implement automatically sending the resulting event to the sink.

For example, this event subscription defines that after execution, the result CloudEvents event MUST be sent to http://example.com/event-processor.

{
  "id": "sub-193-18365",

  "config": {
    "data": "hello",
    "interval": 5
  },

  "filters": [
    { "prefix": { "type": "com.example." } }
  ],

  "protocol": "HTTP",
  "protocolsettings": {
    "method": "POST"
  },
  "sink": "http://example.com/event-processor"
}

So the proposed executor would invoke the handle-cloudevent function implemented by the Wasm component, then send the result to the sink address. The Subscription API supports HTTP, MQTT, AMQP, Kafka, and NATS as sink protocols.

Initially, we would implement a subset of the potential protocols.

The concepts of the Subscription API, the sink address, and the underlying runtime sending events are very closely related to a few ideas we have been discussing in the past around "output bindings" — the ability to simply return values from executing Wasm components and have Spin automatically send those results to useful places (i.e. publish on queues, store as blobs).

These are some initial, unstructured thoughts about CloudEvents support. None of the ideas in this comment are strong opinions, and I'm happy to discuss them to find the best solution for Spin.

itowlson commented 2 years ago

A custom executor is one way to approach this, and provides a very immediate solution that we already know well how to do. However, at the moment executors are very intimately linked to triggers. You don't want the HTTP, MQTT, Kafka and NATS triggers to all have to carry around cloud events code. And it would be nice if protocol/product owners could add executors for their own application protocols without needing to touch trigger repos.

Obviously we could reorganise things so that executors rather than triggers are the 'units' but we'd need to figure out how one HTTP base could serve multiple executors.

But I wonder if it's worth exploring an alternative approach that allows protocol bindings to be built and injected separately from transports/triggers. In a world like .NET or Java we could do this by dynamically loading code, but that's hard to do in Rust. One possibility is to do the binding guest side, via a linked component. This would have an additional benefit that transport-protocol bindings could be written in any language that compiled to Wasm.

I'm thinking the idea would be something like:

I haven't thought this through to be clear! Just sharing it for discussion.

(Already, thinking about it, we could use the executor setting as the hypothetical magic setting to use a binder, and then the user wouldn't have to care if the binder was internal-Rust or external-Wasm.)

Mossaka commented 2 years ago

@radu-matei Thanks for your thoguhts and this is an amazing writing!

This way, we would follow the HTTP webhook specification (returning an HTTP response with an optional Event response), and reuse the current Spin HTTP infrastructure for everything (executor, component definition, templates, SDK)

For the Webhook use case, I can definitely see the value of reusing the Spin HTTP component to handle the events. I had a working example in spin-kitchensink pr that implements the webhook abuse protection mechanism. I agree that if we can offload the logic of parsing HTTP headers, match header attributes and handle callbacks in the Spin SDK, it will be a much better user experience.

I would argue that if eventually a CloudEvents trigger like handle-cloudevent: function(event: event) -> expected<event, error> is what we want in Spin, then we should make a step further - offload the entire webhook abuse protection logic to Spin internal, such that component writer does not even need to worry about checking HTTP::Options header anymore.

#[cloudevents_component]
pub fn handle_ce(in: Event) -> Result<Event> {
    // before the function is called, the validation HTTP request is handled 
    // and Spin serializes HTTP request to CloudEvents

    // potentially use other external services here
    ...

    // return the event
    Ok(in)
}

The CloudEvents component will automatically deserialize CloudEvents return value back to HTTP response and return back. The benefits of doing this is that

  1. It is trigger agnostic. The component itself does not care about what trigger it is and user is free to change HTTP to other transport protocols bindings that CloudEvents support.
  2. It hinds the if branch to check HTTP OPTIONS method
  3. It is chainable to other CloudEvents component.

A CloudEvents subscription manager

I really like the subscription spec. In fact, there are three next-gen specficiations for CloudEvents, and I think they work together in an amazing way.

  1. Discovery API - Spin could make a request to external service that supports Discovery API to understand

    • what events the service produces and h
    • how to subscribe to the service If in the future, Spin supports multiple triggers, Spin itself could become a service that implements Discovery API for external services to address.
  2. Subscription API, as @radu-matei already put it, Spin could become a susbcription manager.

  3. Schema Registry this allow Spin to understand different schema documents for serialization and data validation.

The concepts of the Subscription API, the sink address, and the underlying runtime sending events are very closely related to a few ideas we have been discussing in the past around "output bindings" — the ability to simply return values from executing Wasm components and have Spin automatically send those results to useful places (i.e. publish on queues, store as blobs).

Totally agree! This could unlock the ability to easily chain multiple Spin applications together in a serverless manner. Imagine 2 Spin application running on the cloud, and one's sink is an adress of another Spin application. This creates a pipeline of handling workloads to wasm modules.

Check out the CloudEvents extensions, which could be very useful in this scenario:

  1. The distributed tracing extension
  2. Sequence basically defined the order of how events should arrive.

This worth a few SIP proposals to fully spec out. I am happy to help bringing CloudEvents support to Spin!

Mossaka commented 2 years ago

See an disussion here on the use cases of CloudEvents subscription spec: https://github.com/cloudevents/spec/issues/767