getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.72k stars 1.75k forks source link

Migrate self-hosted kafka clusters to KRaft #2501

Open hubertdeng123 opened 10 months ago

hubertdeng123 commented 10 months ago

This removes the need for zookeeper in self-hosted. We attempted to do this previously, but it would result in data loss in kafka so holding that off until a later date where it's safe to perform.

Relvant PR's: https://github.com/getsentry/self-hosted/pull/2445 https://github.com/getsentry/self-hosted/pull/2500

aldy505 commented 10 months ago

I'd vote for Redpanda instead. It should be compatible enough to Kafka API, unless Sentry is using weird features that only exists after Kafka v3.1 onwards, as Redpanda's compatibility is between v0.11.0 to v3.1 (see docs). Although, we'd need to create a migration between existing Kafka to Redpanda. The steps that I can think of is:

  1. Create new volume for sentry-redpanda
  2. Consume every message on Kafka, re-publish it on Redpanda.
  3. On finish, don't delete the sentry-kafka volume. Let it as is until there is no further issue.
  4. Stop the Kafka container, replace the Docker image on Kafka to be Redpanda (so the hostname still be "kafka").

Or, as an alternative, they have this one: https://docs.redpanda.com/current/upgrade/migrate/data-migration/

The reason behind "using Redpanda" is to minimize the heavy resource consumed by the JVM. Redpanda is far lightweight than Kafka, I've been using it on production (3 node cluster) for around 18 months.

hubertdeng123 commented 10 months ago

We'd like to be as similar to SaaS as we can be. Right now, Clickhouse versions are way behind and are introducing issues in self-hosted that are not seen in SaaS (one here!). I fear that with introducing Redpanda, there will be an additional burden of maintenance placed on us since other Sentry developers will be on a different platform.

aldy505 commented 8 months ago

@hubertdeng123 So.. is this still on the timeline? And is there anyway the community can know what version of ClickHouse / Postgres / Kafka the SaaS instance is running, in order to keep it pretty much the same for self-hosted?

hd-deman commented 7 months ago

I can confirm that Sentry can work with Redpanda. Connected without any issues; everything is working.

Codel1417 commented 3 months ago

I fear that with introducing Redpanda, there will be an additional burden of maintenance placed on us since other Sentry developers will be on a different platform.

While I am unaware of the technical requirements of this, What about migrating both SaaS and Self-Hosted to redpanda. Wouldn't there be cost savings in SaaS by reducing system resources while increasing throughput?

williamdes commented 4 weeks ago

Since https://github.com/getsentry/self-hosted/pull/3263 got merged there is no more need for a zookeeper @aldy505 can you open a PR with your existing red panda work please? It works since months for me

aldy505 commented 4 weeks ago

@aldy505 can you open a PR with your existing red panda work please? It works since months for me

I'm gonna get back to you later. I'm planning to do some kind of A/B testing of using Redpanda vs Kafka KRaft.

I'm on 24.8.0 with Redpanda and it still works fine.