PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
21.62k stars 1.29k forks source link

Sprint 1.30.0 1/2 - Oct 18 to Oct 29 #6360

Closed marcushyett-ph closed 2 years ago

marcushyett-ph commented 3 years ago

Global Sprint Planning

Retro: Status of Outcomes from Previous Sprint

  1. Quantitative Analysis (Owner: @neilkakkar) - Basic MVP behind a feature flag by the end of sprint - infront of some specific users (users from paths engagement) - design works in parallel
    • 🎯Goal: _Be able to understand the top 3 properties or events that signal failure in the activation flow of PostHog
    • through a production-ready MVP._
    • Status: It looks not great, but this capability now exists^. We'll be talking to users & gathering feedback before we iterate further on it.
  2. Paths User Experience (Owner: @EDsCODE ) - Get people to use it and more feedback and iterate on it
    • 🎯Goal: Get 5 users to use new features for more than 1 day.
    • Status: Released. Sitting tight waiting for users to flow through. We released to 50% of all users and got a few dozen visits the first day. Will continue to monitor follow up users. Goal is 5. What's paid and what's not paid?
  3. Ingestion Data Integrity (Owner: @yakkomajuri) - Ensuring we can easily track and recover dropped events
    • Status: We're on track to meet the goal by end of week
  4. 🔁 Groups - Interview customers, verifying our assumptions/showing proof of value + understand feature flag + groups story better. Group Analytics Questions #6189
    • Start implementing, build an MVP with 1-2 client libraries accessible for our team behind a feature flag during the sprint -
    • Status: Interviews done, MVP implementation started - too much for this sprint to finish goal
  5. 🔁 Session Recording? (Owner: @rcmarron ) - Ship a complete "session recordings" page (and make it easy to find sessions) and an updated playback experience (close open issues - solve the epic)
    • 🎯Goal: Two main ingestion errors completely gone (1 & 2). Note: Consider sentry2 too.
    • 🔁 🎯Goal: 80% of the time I should be able to find the part of the session I care about within 5 seconds. Measured via internal tests.
    • Status: Ingestion bug 1 is fixed + ingestions bug 2 is on track to be fixed. We made progress toward the 5-second goal (buffering of recordings is on track), but have more work to do on the UX for ‘finding the right moment’.
  6. Refactor and test insights logic? (Owner: @mariusandra) - Our additive and testless development style has resulted in various insights and dashboards behaving erratically (borked reloads, white screens, etc). It's time to clean house.
    • Status: Did two huge refactors, which cleaned house a lot and also fixed few bugs. Still the "url too long" issue to fix Thursday, with related refactors. Follow up work can be rolled into other tasks, so on track for completion this sprint.

Retro: What can we do better next sprint?

Focus on what went well and can improve for Quantitative Analysis? @neilkakkar @hazzadous @clarkus @EDsCODE @paolodamico feel free to update this comment in advance with your thoughts.

  1. Doing design & MVP in parallel: We haven't yet figured out the final design, but we still have something up on Production to play around with. This was great - it allowed me to gather feedback quickly and fix data issues while the design is coming along. General learning: When you need production data to test how a feature is doing, bias towards getting a crap version ready ASAP.
  2. Using the product more aggressively - everyone should do that
  3. We left the usability tests towards the end - next time: schedule usability tests early on - no reason to wait

Sprint priorities

  1. We did not meet all the goals we set out to do:
    • Groups: Intention was to have a moonshot - as opposed to a rooftop
    • Recordings: Bit off more than we could chew - we did not account for Alex being support hero - Michal was out - we weren't explicit enough within the team when we set them (more explicit about resources for the team) - We should assume ZERO product work for support hero @liyiy @timgl
    • The owner should ask for more resources or share we're not on track - we should be more explicit about that

Plan: Proposed Goals for Next Sprint

Each goal should have a single owner. Owner can only be an engineer.

  1. Correlation Analysis (Owner: @neilkakkar) - Ship the best Quant analysis tool in the industry (50:50) (@paolodamico @neilkakkar)
    • Why? We've already made significant progress here and are close to what our competitors are offering - we should identify the gaps in our competitors offering and ship a tool without those gaps and something no-one else has
  2. Data Integrity Querying (Owner: @EDsCODE) - Resolve and build robust tests for all known query inconsistency issues affecting persons modal (@marcushyett-ph @EDsCODE) (+ cohorts are accurate)
    • Why? Identified in the data integrity strategy as critical and from recent customer feedback causing users to lose trust
  3. Data Integrity Ingestion (Owner: @yakkomajuri) - Nail ingestion data integrity (50:50) (@marcushyett-ph @yakkomajuri)
    • (Owner: @yakkomajuri ): "Ensuring Postgres and CH persons are fully in sync"
    • (Owner: @tiina303 ): "Allowing events to be ingested in any order"
    • Why? Following incidents with data ingestion, ingestion reliability remains a top priority for our customers
  4. Recordings (Owner: @rcmarron)
    • Why? The session recording experience is unreliable and underused - and key to diagnose causes.
    • 🎯 Goal: 99% of session recordings have a full snapshot (and are therefore not missing)
    • 🎯 Goal: 80% of the time I should be able to find the part of the session I care about within 5 seconds. Measured via internal tests. (continued)
  5. Groups (Owner: @macobo) - Full group analytics support enabled for 5 alpha customers (+ release plan) (@marcushyett-ph @macobo)
    • Why? We have validated this with 5+ of our customers and theres a definite need for for the capability to analyze based on groups

Team sprint planning

For your team sprint planning copy this template into a comment below for each team.

Team ___

## Retro

<!-- Talk about what went well, what didn't go well and any actions to improve next time -->

- 

## Hang over items from previous sprint

<!-- For each item, decide to re-prioritise (and add below) or deprioritise -->

- Item 1. prioritised/deprioritise

## Planning

<!-- Each item should have a single owner. Owner can only be an engineer. -->

### High priority

-

### Low priority / side quests

-
paolodamico commented 3 years ago

In anticipation of the planning session tomorrow, I prepared this doc for Correlation Analysis on industry benchmarking. @neilkakkar & @marcushyett-ph in particular I think it's worth you checking it out.

mariusandra commented 3 years ago

I'm off today and might not make it to the sprint planning, so sharing thoughts in advance.

I basically agree with the proposed goals. However as they are above, in team core experience, Rick and Alex will most likely continue work on session recordings, leaving Paul, Michael and myself free to take up other issues.

For what these other issues should be, I have two three themes, that I'd like to divide between Paul, Michael and myself:

"Nail Fix Work In Progress Flags, Feature"

We have a few flags in active development:

And a few that have been pending for weeks:

A lot of these fall under core experience. I'd like to bring all of them over the line. This would make for a nice new release!

"Nail Trends"

"Nail API"

Just nail it already.

Twixes commented 3 years ago

What does "Nail API" entail? @mariusandra

mariusandra commented 3 years ago

Feel free to edit if you have examples of API tasks that should get done.

I basically meant whatever is still needed to do so we can open multiple projects in different tabs, if anything. Then there's a thing with toolbar API access that's flakey (separate temporary_token system, probably should replace with expiring personal API keys), we should do personal API key encryption, I'd be interested in stronger typings for API responses (mostly for insight queries I guess), and so on.

paolodamico commented 3 years ago

Quick retro from my perspective:

EDsCODE commented 3 years ago

Retro:

Planning

Data Integrity Querying - Eric Goal:?

Paths - Eric Goal: Get X more recurring users

Quantitative Analysis - Neil (Harry, Li) Goal: Get 3 users to LOVE correlation analysis

Group Analytics - Karl Goal: Group analytics MVP enabled for 5 alpha customers

yakkomajuri commented 3 years ago

Platform Team Retro

Progress on Goals

Data integrity

TL;DR

Very good progress. Everything in is place for achieving: "Ensuring we can easily track and recover dropped events" and we also gained some good ground on other issues.

Highlights

We merged https://github.com/PostHog/posthog/pull/6193, https://github.com/PostHog/posthog/pull/6230, and https://github.com/PostHog/plugin-server/pull/596 is ready to deploy.

The final steps here are adding alerting for the dead letter queue size (the statsd metric already exists).

We also made progress on other fronts regarding data integrity. Most notably, we identified 2 more significant issues. One was inconsistent processing of event batches (fixed in #6230) and the other one is person rows being collapsed incorrectly in ClickHouse, leading to wrong cohorts (work on this has started, see https://github.com/PostHog/plugin-server/pull/597).

Finally, we also made progress on https://github.com/PostHog/plugin-server/issues/371. We merged https://github.com/PostHog/posthog/pull/6259 and the most important PR for this is coming tomorrow. https://github.com/PostHog/product-internal/pull/191 was also created.

What could have gone better

I personally ended up being pulled in a lot of different directions (demos, support, etc.). A lot of this was circumstantial, so not necessarily anything actionable here.

Other Priorities

We also:

yakkomajuri commented 3 years ago

For the next sprint, there are 2 data integrity goals to consider:

mariusandra commented 3 years ago

In addition to the recordings goals above, team Core Experience has two more goals for this sprint:


Footnotes:

Team Retro & raw notes

## Retro Rick - good to work on related yet separate things with Alex, didn’t conflict - speeding up session recordings: should have had a high level discussion and discuss priorities more before starting - improvement: breaking work down before doing it before doing it... - improvement: should have asked feedback from core analytics (e.g Karl with all the experience in building recordings) about how to do filters/queries/etc sooner in the process Alex: - planned too much work for the first week - a lot of things to do in the playback experience, many edge cases (e.g. adding a caching layer), still things to fix - improvement: higher level discussion on the strategy would help Michael: - off 1 week, did a lot of refactoring and bugfixes - wanted to get to the end of the projects api/frontend task - was good to talk through approaches today and figure out Chris: - still navigating being allocated between two teams Marius: - glad the team is functioning well and everyone was delivering ## Next sprint goals and prorities: Alex Rick - Session recordings - Why recordings are missing? @rcmarron has found interesting things... 10k/day. - distinct_id issue (merging recordings by person_id) - Playback experience 5 second goal - Custom seekbar with highlights, events, thumbnail previews?, etc. - Session recording list filter by cohorts and few other things to get parity with old solution Paul - Breakdown by multiple properties Michael Marius - “Make it so that when you go from a dashboard to an insight it loads in 0.1 seconds.. .and back the same way with also the same impressive speed” - Saved insights - Dashboards - IA beginnings - Settings page Chris - Do things

yakkomajuri commented 3 years ago

Team Platform

Retro
- Yakko - very good sprint. Pushed a lot of data integrity related stuff. Found even more issues which is ✅ . Had some unexpected thrash because of external factors. - Guido - L3 Autonomous! Caught up with other people on the team. Continued quest to document and clean up infrastructure on AWS. Dove into the deep end of Helm charts. Identified a lot of things to fix in the helm chart. Need to think about how we do PR reviews across time zones. The feedback loop can be up to 3 days for something that should be done in a day. - Tiina - Pretty good. Shipped Java library. Excited to get out of YAML land and ship more code. Code is way nicer to ship and way less hitting your head against the wall. 'It's great'. - James - A lot of async work around events schema migration. I actually enjoyed picking up the monorepo task. ## Hang over items from previous sprint @fuziontech: Events table migration to unblock upgrades **prioritize** ~~@yakkomajuri~~ @tiina303: properties_last_updated_at (carried over) (originally owned by @yakkomajuri, @tiina303 taking over the last mile) **prioritize** @fuziontech: Monorepo with plugin-server #6095 **prioritize**

Planning

Goals for next sprint

High priority

  1. Data Integrity Ingestion (Owner: @yakkomajuri) - Nail ingestion data integrity (50:50) (@marcushyett-ph @yakkomajuri)
    • (Owner: @yakkomajuri ): "Ensuring Postgres and CH persons are fully in sync"
    • (Owner: @tiina303 ): "Allowing events to be ingested in any order"
    • Why? Following incidents with data ingestion, ingestion reliability remains a top priority for our customers
  2. Infra work @guidoiaquinti
    • Kafka resizable PVC in chart
    • Helm chart - upgrade dependent charts to stable versions
      • cert-manager
      • kafka
    • AWS - cost and usage auditing - reserved instances
    • Container security
    • Ansible the configs for ClickHouse @fuziontech
  3. Migration script/docs for CH -> CH setups @yakkomajuri
    • This may get a bump in priority because of a customer wanting to migrate from GCP to AWS

Low priority / side quests

https://github.com/orgs/PostHog/projects/10