allegro / hermes

Fast and reliable message broker built on top of Kafka.
http://hermes.allegro.tech
Other
810 stars 210 forks source link

Safe recreation of topics #1739

Open szczygiel-m opened 11 months ago

szczygiel-m commented 11 months ago

Currently hermes-management has a bug which is caused by deleting topic and then recreating it quickly

The story is similar every time:

  1. someone deletes a topic
  2. topic is recreated
  3. kafka producer has stale metadata for that topic
  4. kafka producer fails to send messages to the brokers
  5. messages are buffered in hermes frontend instances
  6. we need to restart frontend instances in order for the messages to be retransmitted

Issue was thought to be solved with upgrade to kafka client 2.8.2 but again appeared recently. We would like to have a workaround for this.

One of the proposed solutions is to introduce "grace period" for deleted topics. E.g. if someone deletes a topic we should block the creation of topic with same name for long enough so that cluster and kafka producers can be in consistent state. Probably > 5 minutes is enough because metadata is refreshed every 5 minutes. 

debanjanc01 commented 11 months ago

Hey @szczygiel-m , is this up for grabs?

szczygiel-m commented 10 months ago

Hi, sure 😄 Assigned