Analytics Engine support in workerd

cloudflare / workerd

The JavaScript / Wasm runtime that powers Cloudflare Workers

https://blog.cloudflare.com/workerd-open-source-workers-runtime/

Apache License 2.0

6.28k stars 303 forks source link

Analytics Engine support in workerd #536

Closed jspspike closed 1 year ago

jspspike commented 1 year ago

I'm looking to add analytics engine support in workerd. Other than adding the binding to workerd.capnp I will also need to add an implementation for writeLogfwdr. The edgeworker implementation creates a socket connection to logfwdr and writes capnp messages to it directly with the logfwdr channel written inside the capnp message.

I was wondering if it would be okay for the workerd implementation to be provided a ServiceDesignator in the configuration and then make http requests to the provided service with the capnp message as the body of the request. This would differ from the edgeworker implementation but seems like it would be consistent with other workerd bindings that forward information to services.

kentonv commented 1 year ago

That would work, but I wonder if it's really the right fit for these log messages. The system is really designed to push logs to some Kafka-esque bulk aggregator where they can be processed efficiently en mass.

Maybe we should simply allow workerd to be configured with a logging socket, and then it just pushes these messages to that? Miniflare could serve that socket for testing purposes. That would be much more similar to what we do in production.

But I dunno, I have mixed feelings about this.

jspspike commented 1 year ago

@kentonv The logging socket would be make sense, I originally elected for an http request because it would be similar to the other workerd bindings and miniflare currently uses a worker as the service for each "plugin". It doesn't need to be done this way though.

So something like?


struct Config {
  # Top-level configuration for a workerd instance.

  services @0 :List(Service);
  sockets @1 :List(Socket);

  v8Flags @2 :List(Text);

  extensions @3 :List(Extension);

  logging @4 :ServiceDesignator;
}

struct Worker {
...
  struct Binding {
  ...
    union {
    ...
      analyticsEngine @15: AnalyticsEngine;
    }

    struct AnalyticsEngine {
      dataset @0 : Text;
      schemaVersion @1 : Int64;
    }
  }
}

where logging is a service that will accept a socket connection from workerd to forward AnalyticsEngineEvent capnps to

kentonv commented 1 year ago

Sorry, to be clear, I was imagining specifying the path of a unix socket which is handled by some external service, not another Worker. That's what happens in production -- logfwdr is a separate service that aggregates every kind of log to be sent back to the data pipeline in core where they are all processed.

If we want to direct these events to another Worker, then the binding itself should specify the target service name. I think we should also consider defining a custom event type for this, similar to trace events. Delivering it as an HTTP event is weird since there's no one waiting for a response, and you'd have to give the request dummy headers, a dummy URL, and a streaming body, all of which seems like a lot of unnecessary overhead.

Again, I'm sort of ambivalent between these two options...

jspspike commented 1 year ago

@kentonv I think I wasn't very clear in my last comment. What I meant by the second half of my previous comment (after "So something like?") was a proposal for the solution you stated. Where logging @4 is an external service that workerd would connect to via a unix socket. workerd would create a logfwdr channel for each AnalyticsEngine binding and send those messages to this external service

kentonv commented 1 year ago

Ah. I was confused because ServiceDesignator is specifically used to refer to other workers in the same workerd instance. So I want expecting Text, with a comment saying it specifies an address to connect to.

But I guess your intent was that the ServiceDesignator would designate a service of type ExternalServer? Actually, maybe that is best. Then it can be overridden on the command line and all that. And actually, this could simultaneously support connecting to an external socket or connecting to an in-process Worker, as soon as we add support for Workers to receive TCP connections.

So maybe you had it right after all. :)

kentonv commented 1 year ago

Though... I guess now that I think of it, I'd rather this be a property of the AnalyticsEngine binding and not a global property.