Closed mariusandra closed 3 years ago
A few questions:
If I have both processEvent
and processEventBatch
, what happens? From my understanding the server first runs all processEvent
functions available and then moves on to all processEventBatch
. So is processEvent
going to modify the event and then it will be dumped into processEventBatch
? Just trying to understand this specific case a bit more.
Asking before I go deep into it: How's the plugin-server structured in the deployments currently? e.g. how come I can run plugins on Heroku without the plugin server?
processEvent
function and that's that. If you need more control with async operations, change this to processEventBatch
. If you don't supply your own batch function, we use our own that just asynchronously calls processEvent
for each event (or synchronously if no promise is returned).If you have both defined, only one of them will be called. For now just processEvent
, but this will change once we get Kafka running
worker
dyno starts both the pluginworker
and celeryworker
processes. However if you need to scale, you should add more of either one or the other. Together in one instance they are breaching the per-dyno memory limits and if you have a bigger app with many plugins, it's wise to launch separate dynos and shut down the default worker
.Great, thanks! This saves me some time
Some last minute changes on Friday that got merged Monday added three very cool features to plugins in 1.19. Would be awesome to document them.
Scheduled tasks. These are already documented! Thanks! I left some feedback on the docs update PR regarding them.
posthog.capture(event, properties)
-- does what it says it'll do. It bypasses the JS libraries and Django HTTP server and directly puts an event into celery. There is no other posthog.*
function (e.g. identify) right now. This capture
can be called anywhere, including in processEvent
, but then it will probably lead to an endless loop, as every event will emit a new event... or two. Thus it's wise to only call it within the setupPlugin
and runEveryX
functions. There's a chance I'll disable it altogether for processEvent
at some point. Not sure.
Plugin Editor. This is the main thing that necessitates an overhaul of the docs. All the information inside the docs is still valid, yet the spirit has changed a lot. Now you can just copy/paste a bit of javascript and you have a plugin. No longer must you create a repository on github, upload a package to npmjs or have a localhost posthog environment running. Click "new plugin", enter a name and write JavaScript (running in a VM in NodeJS 14, so everything that's supported in node 14 works, including ?.
). The editor is still extremely raw, yet completely changes the way you would approach plugins. We will still have the plugin repository with a bunch of whitelisted plugins, so all of that will remain like it is.
Few things will change for 1.19 that need to be documented:
1) The addition of
processEventBatch(events: PluginEvent[], meta: PluginMeta)
https://github.com/PostHog/posthog-plugin-server/pull/39You can either define
processEvent
,processEventBatch
or both in your plugin - the other one is created automatically. CurrentlyprocessEventBatch
is not in use - it will only gets batches of one event since that's how we talk to Celery. In the future and especially on EE or cloud where we use Kafka to get events in batches, we will also pass the received events to this function as a batch. We might also add some kind of batching for celery.The idea is if you're sending events to S3, you don't want to make 100 requests (e.g. per second), one for every event. It would be better to make one request and send 100 events at once.
To prevent any data leakage, the raw Kafka events are further split into batches per team before reaching
processEventBatch
. This makes sense since plugins are also enabled on a team-by-team basis now.We haven't defined a limit for the batch size yet. Currently node-rdkafka is configured to get events in batches of 100, but this might change as we perform more benchmarks. I don't expect the batch size to go over 1000 though. It should remain within the realm of "can submit in a POST request", even if it'll be a ~500kb request (1000 events of 500 bytes?).
2)
meta.cache.incr
andmeta.cache.expire
.Plus adding
ttlSeconds
tometa.cache.set
and returning a promise from it.https://github.com/PostHog/posthog-plugin-server/pull/42/files#diff-375753e4853c3395064f0dd9469cd7995477be5f2f20f881ef930c7594fb674e
3) The plugin server can now be configured with ENV variables. If you run
posthog-plugin-server --help
(cd plugins && yarn start --help
in the posthog app), you'll see this:All these config options are configurable with ENV variables. Just convert the config key to uppercase and replace "-" with "_". For example
--database-url
becomesDATABASE_URL
.When running the plugin server via
bin/plugin-server
(set in most scripts), we fetch and pass these keys from django:The others could be set via env variables in your cloud of choice.
The important ones that you might want to tweak are
WORKER_CONCURRENCY
andTASKS_PER_WORKER
. While worker concurrency is taken from the nr of CPUs available, you might want to fine tune it. TheTASKS_PER_WORKER
env specifies how many "async" tasks the worker will run in parallel. I'm not yet sure what's the best value here. 10 seems safe. 100 seems fine, thought might not be if every async processEvent makes a HTTP request to the same sever. This all needs to be tested, so the param is here to be tuned.