PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.56k stars 1.22k forks source link

Minimize ingestion delay impact on onboarding and experimenting #21024

Open tiina303 opened 5 months ago

tiina303 commented 5 months ago

Growth team would love for us to be able to give users <30 sec feedback loop on ingestion during onboarding.

For onboarding, ingestion delays can really impact the new user experience and be super frustrating. Options to consider

  1. have information about the lag
    • a problem here is that the lag if it's low can depend on which partition the message ends up on so the lag will vary between partitions, and showing the max of those might make us look worse.
  2. have a priority queue for new orgs
    • that seems quite complex to identify who should be there, when to move them around in addition to having a separate queue which is complex
  3. build live tailing (Xavier has mentioned this idea in the past)
    • potentially ad-hoc spin-up a Kafka consumer that gives us messages as they come from capture
  4. Separate basic events and person processing, which would allow basic events to be processed much faster
    • we could have basic events table in CH, that has a short TTL
    • pipeline would write to that table, and person processing happens separately taking events from there and adding them to the real events table with person_id and person_properties
raquelmsmith commented 5 months ago

For the lag variation, we could use an average of the different partitions? Or a median?

xrdt commented 5 months ago

3 would be particularly neat. We had spoken a number of times about exposing a live tail in onboarding.

MarconLP commented 4 months ago

related: https://github.com/PostHog/posthog/issues/17273