PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.59k stars 1.23k forks source link

Plugin Source Testing #6890

Closed mariusandra closed 9 months ago

mariusandra commented 3 years ago

There are two things that could be fixed at the same time:

  1. The "Plugin Source Editor" will eventually need a "test mode", which would let us run random JS code and a) verify it's not broken and b) see how it processes events. This untrusted code should be run on the plugin server and in a separate VM and possibly in a separate worker thread. How do we ask the plugin server to run these one off tests? Celery? Moved to https://github.com/PostHog/plugin-server/issues/379

  2. We need to cache what each plugin supports. Basically if it has support for 1) scheduled functions runEveryX 2) event processing functions processEvent / processEventBatch. This lets us separate VMs better in the future. For example having a separate pool of piscina workers for scheduled tasks, starting there only the VMs that are relevant. This caching could be run when a plugin is installed.

macobo commented 3 years ago

Been thinking about this.

Web servers

It would seem reasonable to solve it like this:

  1. We set up a web server for plugins which can respond to this arbitrary code execution request. Specifically, user can upload source, test(s) and a task name and get results back. Proposal: POST /api/test?
  2. In app, we expose the plugins API by proxying requests. Proposal: POST /api/plugin_server/X gets forwarded to plugin servers /api/X.

This web API solution can later be extended to e.g. allow extending posthog.js via plugins or UI plugins, both of which could request javascript files from the plugin server.

The tricky part for me is whether we can reliably set up and load balance the web server across all of our deployment options:

  1. For cloud, we'd need to set up an ELB and expose its URI to posthog core web. I couldn't find any terraform/other files that set up the current ELB though.
  2. For docker-compose updating should be easy - the web server will be started on localhost
  3. For heroku, ??? since workers can't receive web traffic
  4. For cloudformation we could also set up an internal ELB and route traffic there. We'd need everyone to update their cloudformation config though, which is a PITA
  5. Didn't investigate digitalocean or any other options

Because of this, the web server approach seems like it would be operationally hard to get this live and working everywhere.

cc @fuziontech @mariusandra @Twixes - am I missing something simple here?


Alternative: celery-based RPC

An alternative is to implement a celery-based RPC system, similar to how funnels are calculated in the main codebase.

This is easier to roll out but more finicky with the specific implementation due to details like needing to make the endpoint polling and needing to use two services to do the back-and-forth (celery + redis).

For the longer-term ideas (posthog-js, UI plugins), latency might be an issue, but we can make heavy heavy use of caching and/or implement e.g. grpc or something else then.

Twixes commented 3 years ago

You're on point with the web server routing problem, we can't really do it due to Heroku. On every other deployment option we can do extensive routing one way or another (Traefik would be amazing for docker-compose for example), but Heroku is important AND limiting… Hence async communication between the main server and plugins via Celery seems like the best bet.

mariusandra commented 3 years ago

We can run the plugin server in the web dyno, possibly on some lite mode where it just responds to HTTP requests. I don't see why this can't work. There should be enough memory in any Heroku dyno for this... and in the worse case we could proxy directly in django.

macobo commented 3 years ago

Re heroku, do you mean putting plugin server instance running together with web? Wouldn't that result in us not being able to scale plugin server independently there?

mariusandra commented 3 years ago

also together with web, just for the "web plugins".

macobo commented 3 years ago

@mariusandra thanks for the idea, but this feedback does not address the larger question. Do you have input if we should we use a web server or not? The original comment contains some other issues that would arise with that approach as well.

mariusandra commented 3 years ago

I do imagine we want to talk to the plugin server directly, and not through celery.

Technically, we can use a web server. There are just a few places to modify, like ./bin/docker-server (used in ECS, Helm, etc) and Procfile (calls gunicorn directly, needs to be changed to use docker-server?) and perhaps something else. We see from Heroku that running the plugin server with the celery worker in the same worker dyno works fine, so it should probably also work fine if we add this to the web dynos (tasks, containers, whatever the terminology). Any incoming request can be routed from django to the plugin server (via localhost http) or via a load balancer directly bypassing django if possible. Then the request would be sent on to some piscina task.

Finally, we should add some flag to the plugin server, to not load any plugins that are not "web" compatible to save some resources.

Hence, I think we should. The celery-based approach is an alternative, but introduces a lot of latency and open connections that django will surely run out of while waiting for the response to make its way back.

Imagine some service that sends a lot of webhooks. For example a "mailchimp events plugin" (makes a "sent" or "delivered" event every time a webhook calls). A user sending a million emails every day (and getting a million webhook requests) can easily clog their redis this way.

mariusandra commented 3 years ago

Just for reference, here's a bunch of webhook plugins aka source functions for segment: https://github.com/segmentio/functions-library/tree/72c80c8d86f179389fcfc59555e06acec3f11775/sources

mariusandra commented 3 years ago

Actually, there are a few usecases here that might varrant a customised approach. Just thinking out loud, not claiming to have answers:

1) Webhooks to plugins (a POST to /api/plugins/ID/hash/webhook) - we could potentially just reply with 200 OK and push the request onto another kafka topic, which will later be sent to the onRequest (or whatever) function on a plugin. We can't respond to webhooks with custom JSON/HTML this way, but usually a 200 status code is usually all most webhooks want.

2) Frontend plugins - we need to send some .js from the plugin server to the frontend. We could do this by just storing the JS in postgres and loading it on the frontend? No API requests required. I guess posthog-js plugins could be handled in a similar way.

3) Running tests in the editor - these might even make sense to store in the main DB just to have a good log of what test has been run on what code, when and by whom. The plugin server could just poll every few seconds for unfinished tasks and run them... or we can just use celery (or graphile worker) to communicate here. The frontend might just poll the plugin tests table for a completed status.

fuziontech commented 3 years ago

If what we are trying to do is just throw some JS at the plugin server and get a response I would set it up like this:

Frontend => Django => Plugin server ---------------------^ This can be just normal RPC over HTTP or we could get fancy and use GRPC

Eventually I imagine that we'll have a number of services that handle special requests but all of them should be fronted by the Django app as a router since it manages auth(n|z) and is where the LB is setup. From there it's easy enough to route from one task to another through service discovery so no extra ELBs required. Same thing on k8s. This becomes tough on heroku though.... We could just docker compose on heroku?

macobo commented 3 years ago

How would you set up the RPC over HTTP?

In an ideal world I agree, but my worry is that introducing a web server to all of our deployment options is tricky. In addition we saw/are seeing with plugins how getting existing users to update their cloudformation template or heroku config can be tricky.

Are these misguided assumptions?

mariusandra commented 3 years ago
  1. I guess fetch('http://localhost:${pluginIp}') or something similar :)
  2. That's why we try to pipe all services via bin/docker-* style scripts. We probably won't have to change things in many places... and things can just "progressively enhance" support for webhook and other plugins until the updates are made.
macobo commented 3 years ago

That wouldn't work on cloudformation given that web and plugin-server are running in different containers. To set this up we'd need to set up a load balancer which in turn requires users to update the cloudformation config as they update (which they might not do by default).

A similar situation might occur on heroku as well as I haven't tested whether you can start http services on there.

Progressive enhancement is a strategy but still increases the operational complexity a lot - we now need to handle N things not existing.

I feel like I'm repeating myself a lot in this thread. :( Still seems like celery as pseudo-rpc is the best option even though it's the worst one in a vaccum.

mariusandra commented 3 years ago

I've been saying for a while now that we should just start the plugin server in the same task/container/dyno as web, but just for webhook and othe web* plugins.

posthog-bot commented 10 months ago

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

posthog-bot commented 9 months ago

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.