Aiven-Open / karapace

Karapace - Your Apache Kafka® essentials in one tool
https://karapace.io
Apache License 2.0
470 stars 71 forks source link

schema-reader: Shutdown service if corrupt entries in `_schemas` topic #936

Closed nosahama closed 2 months ago

nosahama commented 2 months ago

About this change - What it does

These breaking changes are guarded by environmental variables

These can be toggled by the various users, the default values are shown below:

KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE: false
KARAPACE_KAFKA_RETRIABLE_ERRORS_SILENCED: true

The logic below only applies when KARAPACE_KAFKA_SCHEMA_READER_STRICT_MODE is set to true, everything else remains the same, thus no test changes.

Previously, when we encounter errors within the _schemas topic, we would continue the message loading and skip the problematic schema. This is not ideal as it might leave the application with corrupt schema data and the side-effects could be grave.

What we do now is to kill the service, log the errors and allow a graceful shutdown. We will follow this work by adding metrics for such cases.

This will also stop the service post backup-v1 restore if there are any corrupt schemas present in the backup log file

We can see below that the shutdown is graceful:

Screenshot 2024-08-21 at 15 15 24

Adding restart: always to docker compose shows the behaviour, the service never proceeds past that stage, i think based on the restart behaviour for systemd, we might need to rely on the alerts and then intervene otherwise it'd leave the service in a crash loop, we need to verify if there are SLOs or metrics setup somewhere to track at least service uptime, which will be affected by this.

github-actions[bot] commented 2 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  karapace
  config.py
  errors.py
  in_memory_database.py
  karapace_all.py
  messaging.py
  rapu.py
  schema_reader.py 193, 203-207, 370, 519-521, 535-537, 551-556, 566-567, 603
  schema_registry.py
  schema_registry_apis.py
  statsd.py
  utils.py
  karapace/compatibility
  __init__.py
Project Total  

This report was generated by python-coverage-comment-action