Closed mariusandra closed 3 years ago
I really like it!
One early 'internal' consumer for this could be session recording, but this would require:
However now I'm thinking on this more deeply, I think point 2 would cause issues - on cloud, we would need to do a lot of work to limit access to only your own organizations data.
I think instead of relying on plugins, let's roll it into main repo and extract as a plugin as it evolves. Thoughts?
I believe scheduled tasks and access to models will be easy to add. I just wanted to get all the rest solid first.
Regarding API access, you can literally now inside a plugin do:
from posthog.models import Event
events = Event.objects.filter(team_id=config.team)
... and then do whatever you need.
It's just not wise to do these queries inside the process_event block though. So we need scheduled tasks. This was basically already handled during the hackathon by just passing the celery object to the plugin, giving the plugin the opportunity to register tasks, but I removed that for now.
Obviously plugin authors will need to be careful to scope their queries to a team, like we do now in the main codebase. This will be up to plugin authors to handle though... :/
And we won't run unknown plugins on app, so this shouldn't really be an issue.
No custom plugins on cloud right?
If we're exposing our models then I think we should do another refactoring first as well: either rename our root module in this repo due to the conflict with https://github.com/PostHog/posthog-python. It's more than conceivable that users would love access to both without needing to hack around it.
I think this could be a great use case for a plugin and a nice example for others to follow when making their own retention style plugins. That said, feel free to start coding this inside app and we can extract later.
I'm so excited by this, but I think we need to think about ensuring adoption.
The range of what you can do is severely limited at the moment. Opening up all models would make plugins far more versatile.
It's pretty simple
Whilst making a plugin is simple, for someone outside our core team who isn't already doing local development, I don't think it's trivial - they would need to deploy PostHog locally, manually = ~12 commands.
The advantage of making this entire process end to end trivial is that we'll get more people in the community building plugins. This would be a strategic benefit as it'll make us achieve platform status.
A few thoughts on improving this - although I am very open to alternative ideas, as I'm not really the target audience:
Could we automatically filter all queries by team for any plugin, somehow? It feels like relying on people to add their own appropriate team filters is unrealistic.
Here's another thing to consider.
Plugins are currently exported as a class with the following signature:
# exampleplugin/__init__.py
from posthog.plugins import PluginBaseClass, PosthogEvent, TeamPlugin
class ExamplePlugin(PluginBaseClass):
def __init__(self, team_plugin_config: TeamPlugin):
super().__init__(team_plugin_config)
# other per-team init code
def process_event(self, event: PosthogEvent):
event.properties["hello"] = "world"
return event
def process_identify(self, event: PosthogEvent):
pass
def process_identify(self, event: PosthogEvent):
pass
The classes for these plugins are loaded into python when the app starts (or a reload is triggered).
These classes are initialized (plugin = ExamplePlugin(config)
) also on app start (or reload), but per team and only if there's a team-specific config stored for this plugin.
This means that in a large app with multiple teams, we can have thousands if not more copies of the same object loaded in memory. For example, if we load a 62MB IP database with every initialization of the maxmind plugin for each team, with a thousand teams we'll need 62GB of RAM.
Thus it must be possible for plugins to share state per app instance and thus they need some per_instance
and per_team
init hooks.
Here are two ideas to solve this.
# maxmindplugin/__init__.py
import geoip2
from posthog.plugins import PluginBaseClass, PosthogEvent, TeamPluginConfig
from typing import Dict, Any
def instance_init(global_config: Dict[str, Any]):
geoip_path = global_config.get("geoip_path", None)
reader = None
if geoip_path:
reader = geoip2.database.Reader(geoip_path)
else:
print("π» Running posthog-maxmind-plugin without the 'geoip_path' config variable")
print("πΊ No GeoIP data will be ingested!")
return {
"config": global_config,
"reader": reader
}
# # Not used for this plugin
# def team_init(team_config: TeamPluginConfig, instance: Dict[str, Any]):
# return {
# "config": team_config.config
# "cache": team_config.cache,
# "team": team_config.team,
# }
def process_event(self, event=PosthogEvent, team_config=TeamPluginConfig, instance_config=Dict[str, any]):
if instance_config.get('reader', None) and event.ip:
try:
response = instance_config['reader'].reader.city(event.ip)
event.properties['$country_name'] = response.country.name
except:
pass
return event
def process_identify(self, event: PosthogEvent, team_config=TeamPluginConfig, instance_config=Dict[str, any]):
pass
def process_identify(self, event: PosthogEvent, team_config=TeamPluginConfig, instance_config=Dict[str, any]):
pass
I'm not set on the naming of things... nor on the exact shape of dicts/objects returned from each function, so please ignore that (and share feedback if you have it). The point is this being a "serverless" or "functional" shared nothing style approach. We would call the instance_init or teaminit functions as needed and pass the objects returned to each process* method.
class MaxmindPlugin(PluginBaseClass):
@staticmethod
def init_instance(global_config: Dict[str, Any]):
geoip_path = global_config.get("geoip_path", None)
if geoip_path:
MaxmindPlugin.reader = geoip2.database.Reader(geoip_path)
else:
print("π» Running posthog-maxmind-plugin without the 'geoip_path' config variable")
print("πΊ No GeoIP data will be ingested!")
MaxmindPlugin.reader = None
def init_team(self, team_config):
pass
def process_event(self, event: PosthogEvent):
if MaxmindPlugin.reader and event.ip:
try:
response = MaxmindPlugin.reader.city(event.ip)
event.properties['$country_name'] = response.country.name
except:
# ip not in the database
pass
return event
Here the same class would have two methods, one static init_instance
that sets properties on the class itself... and one class method init_team
that is called from __init__(self)
when the class is initialized.
In this scenario, we would still init a new class per team per plugin, but with a much smaller payload.
Which option do you prefer? 1 or 2?
I went with option 2 for now.
Also, I made a small TODO list.
For those following along, experimenting with plugins on Heroku, I have run across a new and unexpected issue!
The PUBSUB worker reload code creates too many connections to Redis, making the app unusable on Heroku with the free redis instance. Celery is consistenely running into "redis.exceptions.ConnectionError: max number of clients reached"
errors and won't process tasks.
Unrelated, the worker is also constantly running out of memory and starts using swap:
The explanation is that celery forks a new worker for each CPU core it finds. In the $7/mo heroku hobby dynos, 8 CPUs are reported:
... thus taking up (1+8) * 70MB of RAM and an additional 1+8 celery connections for the plugin reload PUBSUB.
On another branch preview, without the plugin reload pubsub, 12-19 redis connections are already used, making the extra 9 clearly exceed the limit:
Bumping the redis addon to one with 40 connections, I see that 28 are used.
In addition to all of this, there seems to be some issue reloading plugins in the web dynos:
I'll keep investigating, though it seems it might be smart to ditch the pubsub for plugin reloads and just use a regular polling mechanism... though I need to test this.
Alternatively, it might be wiser to hoist the reload up from per-fork to per-worker, putting it basically into ./bin/start-worker and reloading the entire process once a reload takes place.
Hello!
Since I last posted, the following has happened:
pip --dry-run
, so we can either manually parse the requirements, fetch all deps and parse their requirements to make sure no installed package conflicts with what posthog itself requires. Even then, we'll have edge cases and a lot of issues.fetch
or whatever, 3) very limited standard library (it's not node, it's raw v8). Thus I had to drop the implementationSince we're already using celery, it just made a lot of sense to use the existing infrastructure and pipe all events though celery. It works beautifully! π€©
To enable, set PLUGINS_ENABLED=1
and run the app. That's all you need. This might be enabled by default in the next version?
You might also need to run bin/plugins-server
, depending on your setup. The scripts bin/start-worker
and bin/docker-worker
now call bin/plugins-server
it automatically. The command runs a nodejs package called posthog-plugins, which starts a nodejs celery process that listens to tasks with the name process_event_with_plugins
, runs plugins on the event and then dispatches another process_event
task that django picks up to continue the work.
In case the plugins server is down, events will just queue up and hopefully nothing is lost. Plugin reloads are done via a redis pubsub system, triggered by the app.
To install a plugin all you need is a github repo with an index.js
file. Ideally though you'd also have a plugin.json
file that contains some metadata. Here's the example for the helloworldplugin (updated for JS):
// plugin.json
{
"name": "helloworldplugin",
"url": "https://github.com/PosthHog/helloworldplugin",
"description": "Greet the World and Foo a Bar, JS edition!",
"main": "index.js",
"lib": "lib.js",
"config": {
"bar": {
"name": "What's in the bar?",
"type": "string",
"default": "baz",
"required": false
}
}
}
The index.js
file contains the main plugin code. The lib.js
file contains other library code. This could even be a bunch of stuff rolled up with rollup or another bundler, kept away from the main plugin code. The config
part specifies config parameters that will be asked in the interface.
The lib.js
file can be as extensive as you want it. Here's the helloworldplugin example:
// lib.js
function lib_function (number) {
return number * 2;
}
This function is now available in index.js
for the app code to use. The currency normalization plugin makes better use of this by putting functions like fetchRates
in there.
Here's what you can do in the plugin's index.js
:
// index.js
async function setupTeam({ config }) {
console.log("Setting up the team!")
console.log(config)
}
async function processEvent(event, { config }) {
const counter = await cache.get('counter', 0)
cache.set('counter', counter + 1)
if (event.properties) {
event.properties['hello'] = 'world'
event.properties['bar'] = config.bar
event.properties['$counter'] = counter
event.properties['lib_number'] = lib_function(3)
}
return event
}
The setupTeam
function is run when plugins are reloaded and the team config is read from the db. The only thing you can really do there is fetch
for things and use cache
to store data.
The processEvent
function runs for each event. Since everything goes through celery directly before hitting the django app, I removed the previous processIdentify
and other calls. Thus you should make sure that event.properties
exists before changing anything. It for example doesn't exist for $identify calls and some others.
Inside these JS files you can run the following:
cache.set(key, value)
- store something in Redis, scoped to the team (no expiration yet)await cache.get(key)
- get something from Redis, scoped to the team - NB! returns a promise, so must use awaitposthog.capture()
- capture an event, still requires some work and only available in processEvent
for now. passes along the distinct_id, site_url and api_key of the originally received event.fetch
- works as expected!There's still a lot of work to do to clean this up even further, though what is now in the plugin-v8
branch works and unless you enable the PLUGINS_ENABLED
key, the only thing that will happen is that we will start the node plugin server anyway in the bin/*-worker
scripts, but it won't just do anything. That will take up 2 redis connections though - one for the cache, one for pubsub
Here are some todo items to :
setupTeam
)Noting down another plugin idea: tracking how many times a library has been installed. This should again help make product decisions (e.g. which to add autocapture to: flutter vs react-native vs ios).
New stuff!
On all self-hosted installations (no feature flag needed & multi tenancy excluded), when you load plugins from "project -> plugins", you're greeted with this page:
It has two features:
Once enabled per team, in api/process_event
, we just change the name and the queue of the dispatched celery task from process_event
to process_event_with_plugins
.
This task will be picked up by the node worker via celery. After running the event through all relevant plugins for the team, it sends a new process_event
task with the modified payload. This is then picked up by the regular python celery task, which never knew its payload had been tampered with! Sneaky.
There's also a much much much nicer interface to install and configure the plugins (thank you @paolodamico !!):
There are a few rough edges (no upgrades, only string fields), but it as a first beta it gets the job done.
If there's an error in any plugin, either during initialisation or when processing an event, you can also see the error together with the event that broke it:
And when you decide you have had enough, just disable the plugin system and all events pass through celery as normal:
Jotting down some recommendations for the next iteration. The error thing is pretty cool, some suggestions to improve this:
Hey @paolodamico , totally agree with the suggestions and we should make this much better. For now, there's at least something. The error message itself (your third point) is due to the currency plugin. The API actually replies that the key is incorrect, but that's swallowed by the plugin.
Master plan with plugins:
pm2-runtime
) makes senseindex.js
that might be initialised by registerScheduledTask('*/4 * * * *', pollForEvents)
posthog-plugin-server
registerSidepanelItem()
registerScene()
registerGraphType()
I'm sure I forgot some things, but this is basically what we're looking at.
This is turning out to be a long hackathon π
Tasks regarding plugins are now tracked in this project
A few thoughts on stuff that would help these launch successfully:
Depending on your reaction to above, perhaps we should clarify on the project what is a blocker to launching?
Over the last few days plugins have gotten decidedly more exciting.
When PR #2743 lands (and https://github.com/PostHog/posthog-plugin-server/pull/67), we will support:
Both features have their gotchas and are excitingly beta, yet, excitingly, they work well enough for a lot of use cases.
Check it while it lasts. The Heroku Review App for this branch contains a few fun plugins.
1. The "github metric sync" plugin.
Not yet the full stargazers sync, but just syncing the number of stars/issues/forks/watchers as a property every minute:
Screenshot:
Code:
async function runEveryMinute({ config }) {
const url = `https://api.github.com/repos/PostHog/posthog`
const response = await fetch(url)
const metrics = await response.json()
posthog.capture('github metrics', {
stars: metrics.stargazers_count,
open_issues: metrics.open_issues_count,
forks: metrics.forks_count,
subscribers: metrics.subscribers_count
})
}
All events captured in a plugin via posthog.capture
are sent directly into celery (bypassing the Django HTTP API overhead) and come from a unique user.
We can graph this. Our star count is steady!
2. The "Queue Latency Plugin"
This is a pretty quirky usecase.
// scheduled task that is called once per minute
function runEveryMinute() {
posthog.capture('latency test', {
emit_time: new Date().getTime()
})
}
// run on every incoming event
function processEvent(event) {
if (event.event === 'latency test') {
event.properties.latency_ms = new Date().getTime() - event.properties.emit_time
}
return event
}
Since the event is lpush
ed into a list in redis (sent to celery) and only later, read again via redis from the back of the queue, we can use this to measure the queue latency:
Using PostHog to measure PostHog. π€―
Github star sync plugin
I started making a true github star sync plugin, but still have two blockers that need to be solved separately.
Even with these blockers, the plugin is currently possible.
Snowflake/BigQuery plugin
Segment in their functions exposes a bunch of node packages to the user:
With the maxmind
package, I already had a bit of trouble including the .mmdb
reader inside the final compiled plugin index.js
file. I'm now afraid compiling all of @google-cloud/bigquery
, which probably includes some protobuf files that are read through the filesystem via some compiled C code, into one index.js
will prove hard. We'll probably need to expose some of these APIs directly to the user as well.
Other things to improve
There are so many things that can be improved. Browse the Heroku app and write the first 5 you find. Here are some random ones:
runEveryMinute
task, instead of waiting for the next :00
to hit.This is BETA
Plugins are, while legitimately powerful, are still legitimately beta.
The next step is to get this running on cloud and get the snowflake and bigquery plugins out.
Here it is π₯ π₯ π₯ the github star sync plugin
:
async function runEveryMinute({ cache }) {
// if github gave use a rate limit error, wait a few minutes
const rateLimitWait = await cache.get('rateLimitWait', false)
if (rateLimitWait) {
return
}
const perPage = 100
const page = await cache.get('page', 1)
// I had to specify the URL like this, since I couldn't read the headers of the original request to get
// the "next" link, in which `posthog/posthog` is replaced with a numeric `id`.
const url = `https://api.github.com/repositories/235901813/stargazers?page=${page}&per_page=${perPage}`
const response = await fetch(url, {
headers: {'Accept': 'application/vnd.github.v3.star+json'}
})
const results = await response.json();
if (results?.message?.includes("rate limit")) {
await cache.set('rateLimitWait', true, 600) // timeout for 10min
return
}
const lastCapturedTime = await cache.get('lastCapturedTime', null)
const dateValue = (dateString) => new Date(dateString).valueOf()
const validResults = lastCapturedTime
? results.filter(r => dateValue(r.starred_at) > dateValue(lastCapturedTime))
: results
const sortedResults = validResults.map(r => r.starred_at).sort()
const newLastCaptureTime = sortedResults[sortedResults.length - 1]
for (const star of validResults) {
posthog.capture('github star!', {
starred_at: star.starred_at,
...star.user,
})
}
if (newLastCaptureTime) {
await cache.set('lastCapturedTime', newLastCaptureTime)
}
if (results.length === perPage) {
await cache.set('page', page + 1)
}
}
I would like an option to specify a custom timestamp for my event. Other than that, it works! What's more, it makes only 60 requests per minute, keeping below Github's free usage API rate limits :).
Pretty exciting updates @mariusandra, thanks for sharing it in such detail! Would like to start writing out a plugin really soon. In the meantime let me know if I can help with the UI/UX to better communicate the new functionality/workflow.
Cool, looks nice, which external node modules are supported? I assume you need to preinstall and/or white list them?
There are two ways to include external modules.
index.js
like the posthog-maxmind-plugin does. This should be the preferred option, as in general the fewer external dependencies the better.posthog-plugin-server
directly. Right now only fetch
(maps to node-fetch
) is available, though I think we'll already have a few other things for the next release.
For reference, segment does something similar as well.
Memory benchmarks!
As it is built now, posthog-plugin-server
isolates each plugin inside a VM (via vm2). All plugins are also isolated per team (per project). This means 100 projects using the "helloworldplugin" spin up 100 VMs, even if it's exactly the same code they're all running.
So how heavy is a VM? (Un?)Surprisingly, not at all! A simple plugin VM takes about 200KB of memory. A more complicated plugin (100kb of posthog-maxmind-plugin/dist/index.js) takes about 250KB. Thus running 1000 VMs in parallel consumes an extra 250MB of RAM. Said differently, if 1000 customers on cloud enable one plugin, the server's memory footprint will grow 250MB per worker.
Obviously a VM that loads a 70MB database and keeps it in memory throughout its lifetime will consume more memory, but for all intents and purposes VMs are very light.
Originally I had imagined a "shared plugin" system for "multi tenancy" (cloud), where we spin up a bunch of shared VMs that can just be enabled/disabled per team. However I could never get over the danger of leaking data. For example when one processEvent
stores something about the event on the global
object and reads it again next time it's run, but it's an event for a different team. I thought the best way around this was to just whitelist a bunch of trusted plugins that cloud users can run, greatly eliminating this threat.
Now I'm thinking differently. With such a light footprint, we can spin up a new VM for each team that wants to use a plugin, thus completely separating the data in memory. If the number gets more than one CPU core can handle (so over 10k plugins in use?), we can split the work and scale horizontally as needed.
For enterprise customers using PostHog Cloud, we could provide an additional worker-level or process-level isolation. This is what cloudflare does - they split the free workers and the paid client's workers into separate clusters. In our case, with thread level isolation on cloud, each paying customer could get their own worker (aka CPU thread) that runs all their plugins. These workers could be automatically spun up and down as the load changes by the plugin server, protecting paying customers from broken and runaway plugins made by other customers. With something this, we could even enable the plugin editor for all paying customers.
We're really making a lambda here :).
It's been 1.5+ months (including the Christmas break) since the last update, so time for a refresher!
The big big change that has happened since then is that event ingestion is now handled by the plugin server! This is still beta and disabled by default, but when enabled, events after being processed by the plugins, are ingested (stored in postgres or clickhouse) directly inside the plugin server. For Postgres/Celery (OSS) installations, this avoids one extra step. For ClickHouse/Kafka (EE) installations, this makes using plugins possible, as with this setup we have nowhere to send the event after the plugins have finished their work.
The work in the next weeks will be to stabilise this ingestion pipeline and enable it for all customers on PostHog Cloud. Currently we're bottlenecked to ~100 events/sec per server instance (even less for long waiting plugins) and this needs to be bumped significantly. Only after can we enable plugin support for all cloud users. Hopefully next week :).
Other notable changes in the last month or so:
storage.get
& storage.set
API for postgres-backed persistent data in plugins)plugin.json
for all metadata (as opposed to package.json)All that said, with the launch of plugins on cloud (already enabled for some teams to test), we're entering a new era for the plugin server. From now on we must be really careful not to trip anything up with any change and religiously tests all new code!
We also introduced quite a bit of technical debt with the 4000 changed lines of the ingestion PR (all the magic SQL functions, database & type mapping, etc). This needs to be cleaned up eventually.
While we've gotten very far already, there are many exciting changes and challenges still to come. For example:
dealy(10 years)
or for (ever) { noop; }
posthog.capture
callsAnd then we'll get to the big stuff:
I think this has evolved in a bunch of different places and can now be closed? @mariusandra
@paolodamico I think this can indeed be closed, but not before one last update!
It's been 3.5 months since the last update. Let's check in on our contestants.
What we have been building with plugins is something unique... something that in its importance and its value to the bottom line has a legit opportunity overtake all other parts of PostHog (though won't be our defining feature since it's already built).
The Plugin Server has turned PostHog into a self-hosted and seriously scalable IFTTT / Zapier / Lambda hybrid, with RDS, ElasticCache, SQS and other higher abstractions baked right in.
It has become serious application platform on its own.
(Seriously, it has. Check out this 45min talk LTAPSI - Let's Talk About Plugin Server Internals for more)
Combine this with a scalable event pipeline, and you can build some really cool shit. Web and product analytics? So 2020. Here are some more exotic ideas:
Oh and PostHog can still do web and app analytics, session recording, heatmaps, feature flags, data lakes, etc, etc ad nauseam :).
Plugins now power the entire ingestion pipeline. On PostHog cloud, one plugin server can ingest at most a thousand events per second.
Plugins are now used by many self-hosted and cloud customers to augment their data and to export it to various data lakes. We have had several high quality community plugins come in, such as sendgrid and salesforce (should be added to repo?). We've had entreprise customers write their own 700-line plugins to streamline data ingestion.
You just need to write the following to have an automatic batched and retry-supported data export plugin
import { RetryError } from '@posthog/plugin-scaffold'
export async function exportEvents (events, { global, config }) {
try {
await fetch(`https://${config.host}/e`, {
method: 'POST',
body: JSON.stringify(events),
headers: { 'Content-Type': 'application/json' },
})
} catch (error) {
throw new RetryError() // ask to retry
}
}
If you throw the error, we will try running this function again (with exponential backoff) for around 48h before giving up.
We now have a bunch of functions you can write: onEvent
, onSnapshot
, processEvent
, exportEvents
, runEveryMinute
, runEveryHour
, runEveryDay
.
You can export jobs
to have background tasks, which you can soon even run from the UI https://github.com/PostHog/plugin-server/pull/414
export const jobs = {
performMiracleWithEvent (event, meta) {
console.log('running async! possibly on a different server, YOLO')
}
}
export function processEvent (event, { jobs }) {
jobs.performMiracleWithEvent(event).runIn(3, 'minutes')
event.properties.hello = "world"
return event
}
Here's real feedback from a customer that we received (name omitted just in case):
"The power of being able to write a little plugin in 100 lines of JS is just amazing. Can't wait to break out of all our Amplitude/GA/FullStory stack"
Since the last update 3.5 months ago we have built: injecting loop timeouts via babel, polished CD, implemented a bump patch
GH action releasing system, put ingestion live, did a lot of debugging to find the need to add redis connection pools, implemented lazy vms, implemented plugin access control for cloud, added built in geoip support, added the snowflake sdk, the AWS sdk, console logging, job queues, onEvent & onSnapshot, plugin capabilities, and the 185 other PRs that go under "keep the lights on" work.
We're only getting started :).
Look at the team extensibility project board for what we're working on now.
There's a lot of ongoing "keep the lights on" work, which will continue to take up most of the time going forward. This work is not exciting enough to mention here (90% of closed PRs the last 4 months for example), but absolutely important to get through.
From the big things, there are a few directions we should tackle in parallel:
onEvent
, sanitize timestamps, formalize an official event spec, job queue recovery mode, more export destinations.Only when that's done, we could also look at UI plugins. Let's hold back here for now, as the frontend is changing so rapidly. Instead let's take an Apple-ish approach where we only expose certain elements that are ready, starting with the buttons to trigger jobs and displaying the output in the console logs.
I believe the biggest challenge for the plugin server will come in the form of flow control. The job queue next steps issue briefly talks about it.
The plugin server has just a limited amount of workers (40 parallel tasks on cloud). Imagine Kafka sending us a huge batch of events, and at the same time receiving a lot of background jobs, and running a few long running processEveryHour
task. If in this scenario we ask piscina to run another 200 tasks, and keep adding more and more faster than old ones complete, we're going to run out of memory and crash with a lot of in-flight tasks.
To prevent this, there's a "pause/drain" system in place with most queues. We periodically check if piscina is busy, and if so, stops all incoming events/jobs/etc.
If we add more features and are not careful regarding flow control, we can run into all sorts of bottlenecks, deadlocks, and lost data. We must be terrified of issues with flow control if we're to build a project for the ages.
Related, the redlocked services (schedule, job queue consumer) are now bound to just running on one server. This will not scale either. There must be an intelligent way to share load and service maps between plugin server instances... without re-implementing zookeeper in TypeScript.
I'll close this issue now, as work on the plugin server is too varied to continue keeping track of in just one place.
I sincerely believe that what we have built with the PostHog Plugin Server is something unique, with limitless usecases, for personal and business needs alike. It's especially unique given it's an open source project.
Somehow it feels like giving everyone a new car for free.
I'm super excited to see what road trips the community will take with it :).
In order to not pollute the PR with discussion that will be hidden by a thousand commits, I'll describe here what is currently implemented and where do we go from here.
Plugins in PostHog
One of the coolest ideas that came from the PostHog Hackathon was the idea of Plugins: small pieces of code that can be installed inside posthog, providing additional or custom features not found in the main repo.
Two examples of plugins that are already built:
Currently plugins can only modify events as they pass through posthog. Support for scheduled tasks, API access, etc is coming. More on this later.
Installing plugins via the interface
Assuming the following settings are set:
... the following page will show up:
Plugins are installed per-installation and configured per-team. There is currently no fine-grained access control. Either every user on every team will be able to install/configure plugins or not.
When installing plugins or saving the configuration, plugins are automatically reloaded in every copy of the app that's currently running. This is orchestrated with a redis pubsub listener.
Installing plugins via the CLI
Alternatively, you may set the
INSTALL_PLUGINS_FROM_WEB
setting toFalse
and use the posthog-cli to install plugins:Plugins can be installed from a git repository or from a local folder:
Plugins installed via the CLI will be loaded if you restart your posthog instance. They will then be saved in the database just like the plugins installed via the web interface. Removing the plugins from
posthog.json
uninstalls the plugins the next time the server is restarted.In case you use both web and CLI plugins, the settings in posthog.json will take precedence and it will not be possible to uninstall these plugins in the web interface.
As it stands now, it's not possible to configure installed plugins via the CLI. The configuration is still done per team in the web interface.
Creating plugins
It's pretty simple. Just fork helloworldplugin or use the CLI:
Todo for this iteration
PluginBaseClass
and release it as a newposthog-plugins
pip packageFuture ideas
Feedback
All feedback for the present and the future of plugin support in posthog is extremely welcome!