Closed jspspike closed 1 year ago
That would work, but I wonder if it's really the right fit for these log messages. The system is really designed to push logs to some Kafka-esque bulk aggregator where they can be processed efficiently en mass.
Maybe we should simply allow workerd to be configured with a logging socket, and then it just pushes these messages to that? Miniflare could serve that socket for testing purposes. That would be much more similar to what we do in production.
But I dunno, I have mixed feelings about this.
@kentonv The logging socket would be make sense, I originally elected for an http request because it would be similar to the other workerd bindings and miniflare currently uses a worker as the service for each "plugin". It doesn't need to be done this way though.
So something like?
struct Config {
# Top-level configuration for a workerd instance.
services @0 :List(Service);
sockets @1 :List(Socket);
v8Flags @2 :List(Text);
extensions @3 :List(Extension);
logging @4 :ServiceDesignator;
}
struct Worker {
...
struct Binding {
...
union {
...
analyticsEngine @15: AnalyticsEngine;
}
struct AnalyticsEngine {
dataset @0 : Text;
schemaVersion @1 : Int64;
}
}
}
where logging
is a service that will accept a socket connection from workerd to forward AnalyticsEngineEvent
capnps to
Sorry, to be clear, I was imagining specifying the path of a unix socket which is handled by some external service, not another Worker. That's what happens in production -- logfwdr is a separate service that aggregates every kind of log to be sent back to the data pipeline in core where they are all processed.
If we want to direct these events to another Worker, then the binding itself should specify the target service name. I think we should also consider defining a custom event type for this, similar to trace events. Delivering it as an HTTP event is weird since there's no one waiting for a response, and you'd have to give the request dummy headers, a dummy URL, and a streaming body, all of which seems like a lot of unnecessary overhead.
Again, I'm sort of ambivalent between these two options...
@kentonv
I think I wasn't very clear in my last comment. What I meant by the second half of my previous comment (after "So something like?") was a proposal for the solution you stated. Where logging @4
is an external service that workerd
would connect to via a unix socket. workerd
would create a logfwdr channel for each AnalyticsEngine
binding and send those messages to this external service
Ah. I was confused because ServiceDesignator
is specifically used to refer to other workers in the same workerd instance. So I want expecting Text
, with a comment saying it specifies an address to connect to.
But I guess your intent was that the ServiceDesignator
would designate a service of type ExternalServer
? Actually, maybe that is best. Then it can be overridden on the command line and all that. And actually, this could simultaneously support connecting to an external socket or connecting to an in-process Worker, as soon as we add support for Workers to receive TCP connections.
So maybe you had it right after all. :)
Though... I guess now that I think of it, I'd rather this be a property of the AnalyticsEngine binding and not a global property.
I'm looking to add analytics engine support in workerd. Other than adding the binding to
workerd.capnp
I will also need to add an implementation forwriteLogfwdr
. The edgeworker implementation creates a socket connection to logfwdr and writes capnp messages to it directly with the logfwdr channel written inside the capnp message.I was wondering if it would be okay for the workerd implementation to be provided a
ServiceDesignator
in the configuration and then make http requests to the provided service with the capnp message as the body of the request. This would differ from the edgeworker implementation but seems like it would be consistent with other workerd bindings that forward information to services.