PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
21.79k stars 1.3k forks source link

Option to Rate limit events from single users #17586

Open tholor opened 1 year ago

tholor commented 1 year ago

Is your feature request related to a problem?

We use PostHog to track usage & product insights of our developer framework with thousands of users. While the average user sends 10-100 events per day, we experience extreme peaks from single users every now and then sending 100k+ events per day causing extreme costs without any value on our side. However, there doesn't seem to be an option right now in PostHog to rate-limit those users / drop events for a user after a certain threshold (e.g. > 10k events per day) It's becoming such an annoying problem for us, that we are considering migrating to another analytics stack.

Describe the solution you'd like

An option / plugin to limit the events per user in a certain timeframe (e.g. day). Events above the threshold should be dropped right away and not be considered for billing.

Describe alternatives you've considered

Migrating to another product.

Additional context

We are on a paid plan :)

pauldambra commented 1 year ago

Hey @tholor,

This overlaps with a feature I've been thinking about for session replay so I wondered if I could ask a few questions...

1) Is this only in a single PostHog SDK 2) are these custom events or autocapture 3) when a customer peaks does it tend to be a single event that peaks or an increase across all/multiple events 4) does this need to apply across multiple devices/windows. e.g. if this was in the browser and we held local state to count the events and sample/rate limit then a user could login on another session and wouldn't be rate limited - would that work for you

Thanks 💖

tholor commented 1 year ago

Hey @pauldambra

Sure:

  1. Yes, only Python SDK
  2. We track events via the SDK's capture() method here
  3. We saw both: peaks on individual events and a group of 2-4 related events (e.g. originating all from one connected workflow). Solving it for a single event would already be extremely helpful. Solving for multiple ones even better :)
  4. No, it's just a single device and a pure python process running (now UI / browser)
camerondeleone commented 1 year ago

See also: https://posthoghelp.zendesk.com/agent/tickets/6866

lsmith77 commented 1 year ago

Basically it means right now Posthog is an "attack vector" financially. we just had a bug (we don't suspect malicious intend as this point) that caused a single user to generate 1.5M events in a single day. now a malicious hacker could send even more and basically force us to turn off Posthog or go broke.

pauldambra commented 1 year ago

@lsmith77 which SDK was this on for you?

(ultimately if we implement something here then it likely has to be on all SDKs but....)

pauldambra commented 1 year ago

actually for you @lsmith77 same questions as above

1 .Is this only in a single PostHog SDK

  1. are these custom events or autocapture
  2. when a customer peaks does it tend to be a single event that peaks or an increase across all/multiple events
  3. does this need to apply across multiple devices/windows. e.g. if this was in the browser and we held local state to count the events and sample/rate limit then a user could login on another session and wouldn't be rate limited - would that work for you
lsmith77 commented 1 year ago
  1. we are using the Posthog SDK in 3 separate places
    • user dashboard (posthog-js)
    • browser extension (posthog-js-lite)
    • word add-in (posthog-js-lite)
  2. on the dasboard we use auto capture, in the other 2 custom events
  3. in the two cases we had it was a single custom event in the browser extension but I can see it being multiple (especially in a malicious case)
  4. for the non malicious case, a single device at a time (ie. on the SDK level) would be fine, but for the malicious case this would not be sufficient

Note for 4., we are pondering to create some rate limit into our browser extension / word add-in to prevent non-malicious cases. But having it in the official SDK (in the next days/weeks) would be very very appreciated.

lsmith77 commented 1 year ago

BTW ideally the rate limits could be configured based on some property/cohort. Like we might want different rate limits for our freemium and paid customers.

lsmith77 commented 1 year ago

Oh a nice to have would be to have a way to rate limit specific events differently.

tholor commented 8 months ago

@pauldambra Do you have any update on this? Is this on the current roadmap? It's becoming a bigger and bigger pain for us. We just had a similar incident again this month where usage skyrocketed because of a few "abusive" open-source users. Without a solution for this on the closer horizon, I am afraid we'll have to move to another product...

mariusandra commented 8 months ago

I don't believe this is something on our roadmap. Just building team-level rate limits for billing has proven to be challenging enough.

The solution I can propose is to use a reverse proxy, and implement your own rate limiting in that. It should work quite fine with a little redis cache for sites with limited volume... but it's really tricky to handle reliably with the scale we're dealing with.

Curious to hear which are the other products that implement this directly.

lsmith77 commented 8 months ago

@mariusandra but did you ever implement rate limiting inside the client?

mariusandra commented 8 months ago

We batch requests, but I don't believe we have code in any of the clients that starts to reject events after any threshold.

lsmith77 commented 8 months ago

I think client side throtteling should be easy enough to implement (would not come with the "at scale" issues you mentioned above) and could stop a significant of issues already.

mariusandra commented 8 months ago

Perhaps. As a standalone feature, it's definitely not more than a few dozen lines of code. However this makes quite some assumptions about what constitutes abuse, might trigger false positives for unsuspecting users causing other issues, etc. Plus as a client side solution, there will be someone who'll get past it (abuse via incognito, etc), taking us back to square one.

I'm not saying it's a bad idea... I'm saying we're happy to accept PRs 😅.

At some point it might be easier to just implement such filtering inside your own app, either client side or via a proxy.