Aiven-Open / karapace

Karapace - Your Apache Kafka® essentials in one tool
https://karapace.io
Apache License 2.0
445 stars 68 forks source link

Schema reader crashes when replaying protobuf #913

Open davegreen opened 1 month ago

davegreen commented 1 month ago

What happened?

Adding a protobuf schema causes an exception and schema registry becomes unresponsive.

What did you expect to happen?

Schema to be added and replay to continue. API to continue to be available.

What else do we need to know?

Tried with 3.14.0 and 3.15.0

Schema appears to be added OK, but replay fails in the same way, restarting the container results in the below error:

Starting Karapace Schema Registry
Cannot enable Sentry.io sending: importing 'sentry_sdk' failed
karapace                MainThread  INFO        Karapace initialized
root                    MainThread  INFO        

Starting karapace schema registry

karapace.instrumentation.prometheus MainThread  INFO        Setting up prometheus metrics
karapace.coordinator.schema_coordinator MainThread  INFO        Metadata for topic has changed from [] to [Assignment(member_id='v0', metadata=b'{"version":2,"karapace_version":"3.15.0","host":"karapace-schema-registry","port":8081,"scheme":"http","master_eligibility":true}')]. 
karapace.coordinator.schema_coordinator MainThread  INFO        Discovered coordinator 0 for group karapace-schema-registry
karapace.coordinator.schema_coordinator MainThread  INFO        (Re-)joining group karapace-schema-registry
karapace.coordinator.schema_coordinator MainThread  INFO        Joined group 'karapace-schema-registry' (generation 16) with member_id karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889
karapace.coordinator.schema_coordinator MainThread  INFO        Joined as follower.
karapace.coordinator.schema_coordinator MainThread  INFO        Successfully synced group karapace-schema-registry with generation 16
karapace.coordinator.schema_coordinator MainThread  INFO        Join complete, generation 16, member_id: 'karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889', protocol: 'v0', member_assignment_bytes: b'{"master":"karapace-schema-registry-ad1354cc-1653-4808-8b57-6aeb6b1ec925","master_identity":{"version":2,"karapace_version":"3.14.0","host":"karapace-schema-registry","port":8081,"scheme":"http","master_eligibility":true},"error":0}'
karapace.kafka.consumer.KafkaConsumer   schema-reader   INFO        Subscribing to topics ['_schemas'] and patterns []
karapace.schema_reader  schema-reader   INFO        [Schema Topic] Creating '_schemas'
karapace.kafka.admin.KafkaAdminClient   schema-reader   INFO        Creating new topic NewTopic(topic=_schemas,num_partitions=1) with replication factor 1
karapace.schema_reader  schema-reader   WARNING     [Schema Topic] Already exists '_schemas'
aiohttp.access          MainThread  INFO        0.000669s - "GET /_health HTTP/1.1" 200 "kube-probe/1.26" response=570b request_body=-b
aiohttp.access          MainThread  INFO        0.001064s - "GET /_health HTTP/1.1" 200 "kube-probe/1.26" response=570b request_body=-b
karapace.schema_reader  schema-reader   INFO        Replay progress (3.27): -1/0 (0 %) (recs/s -0.30561052807405376)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.coordinator.schema_coordinator MainThread  WARNING     Heartbeat failed for group karapace-schema-registry because it is rebalancing
karapace.coordinator.schema_coordinator MainThread  INFO        (Re-)joining group karapace-schema-registry
karapace.coordinator.schema_coordinator MainThread  INFO        Joined group 'karapace-schema-registry' (generation 17) with member_id karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889
karapace.coordinator.schema_coordinator MainThread  INFO        Elected group leader -- performing partition assignments using v0
karapace.coordinator.schema_coordinator MainThread  INFO        Creating assignment: 'karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889', protocol: 'v0', members: [JoinGroupMemberData(member_id='karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889', member_data=b'{"version":2,"karapace_version":"3.15.0","host":"karapace-schema-registry","port":8081,"scheme":"http","master_eligibility":true}')]
karapace.coordinator.schema_coordinator MainThread  INFO        Chose: 'karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889' with url: 'http://karapace-schema-registry:8081' as the master
karapace.coordinator.schema_coordinator MainThread  INFO        Successfully synced group karapace-schema-registry with generation 17
karapace.coordinator.schema_coordinator MainThread  INFO        Join complete, generation 17, member_id: 'karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889', protocol: 'v0', member_assignment_bytes: b'{"master":"karapace-schema-registry-73c87c16-78ce-4997-b000-456cd1111889","master_identity":{"version":2,"karapace_version":"3.15.0","host":"karapace-schema-registry","port":8081,"scheme":"http","master_eligibility":true},"error":0}'
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.23): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   ERROR       Unexpected exception in schema reader loop
Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/schema.py", line 267, in __init__
    self.proto_file_element = deserialize(schema)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/serialization.py", line 157, in deserialize
    serialized_pb = base64.b64decode(schema_b64, validate=True)
  File "/usr/local/lib/python3.10/base64.py", line 86, in b64decode
    raise binascii.Error('Non-base64 digit found')
binascii.Error: Non-base64 digit found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 244, in run
    self.handle_messages()
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 390, in handle_messages
    self.handle_msg(key, value)
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 586, in handle_msg
    self._handle_msg_schema(key, value)
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 539, in _handle_msg_schema
    parsed_schema = parse_protobuf_schema_definition(
  File "/venv/lib/python3.10/site-packages/karapace/schema_models.py", line 75, in parse_protobuf_schema_definition
    ProtobufSchema(schema_definition, references, dependencies)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/schema.py", line 269, in __init__
    self.proto_file_element = ProtoParser.parse(DEFAULT_LOCATION, schema)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 629, in parse
    return proto_parser.read_proto_file()
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 104, in read_proto_file
    declaration = self.read_declaration(documentation, Context.FILE)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 205, in read_declaration
    result = self.read_message(location, documentation)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 245, in read_message
    declared = self.read_declaration(nested_documentation, Context.MESSAGE)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 215, in read_declaration
    result = self.read_one_of(documentation)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 453, in read_one_of
    fields.append(self.read_field_with_label(location, nested_documentation, None, atype))
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 389, in read_field_with_label
    self.reader.require("=")
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 35, in require
    self.expect(self.read_char() == c, f"expected '{c}'")
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 361, in expect
    self.unexpected(message, location)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 370, in unexpected
    raise IllegalStateException(f"Syntax error in {str(location)}: {message}")
karapace.protobuf.exception.IllegalStateException: Syntax error in :14:54: expected '='
karapace.schema_reader  schema-reader   INFO        Replay progress (0.25): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.22): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.22): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.23): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.22): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.25): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.2): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %) (recs/s 0.0)

This log seems to continue repeating forever, with various similar values in replay progress.

eliax1996 commented 1 month ago

Its a regression or something that was broken even before? Have you tried with for e.g. the version 3.10.0?

davegreen commented 1 month ago

It looks like it's been broken before (unless it's something we are doing).

ghcr.io/aiven-open/karapace:3.10.0 looks the same to me:

karapace.schema_reader  schema-reader   INFO        Replay progress (0.28): -1/0 (0 %)
karapace.schema_reader  schema-reader   ERROR       Unexpected exception in schema reader loop
Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/schema.py", line 267, in __init__
    self.proto_file_element = deserialize(schema)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/serialization.py", line 156, in deserialize
    serialized_pb = base64.b64decode(schema_b64, validate=True)
  File "/usr/local/lib/python3.10/base64.py", line 86, in b64decode
    raise binascii.Error('Non-base64 digit found')
binascii.Error: Non-base64 digit found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 228, in run
    self.handle_messages()
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 337, in handle_messages
    self.handle_msg(key, value)
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 532, in handle_msg
    self._handle_msg_schema(key, value)
  File "/venv/lib/python3.10/site-packages/karapace/schema_reader.py", line 486, in _handle_msg_schema
    parsed_schema = parse_protobuf_schema_definition(
  File "/venv/lib/python3.10/site-packages/karapace/schema_models.py", line 72, in parse_protobuf_schema_definition
    protobuf_schema = ProtobufSchema(schema_definition, references, dependencies)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/schema.py", line 269, in __init__
    self.proto_file_element = ProtoParser.parse(DEFAULT_LOCATION, schema)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 629, in parse
    return proto_parser.read_proto_file()
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 104, in read_proto_file
    declaration = self.read_declaration(documentation, Context.FILE)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 205, in read_declaration
    result = self.read_message(location, documentation)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 245, in read_message
    declared = self.read_declaration(nested_documentation, Context.MESSAGE)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 215, in read_declaration
    result = self.read_one_of(documentation)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 453, in read_one_of
    fields.append(self.read_field_with_label(location, nested_documentation, None, atype))
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/proto_parser.py", line 389, in read_field_with_label
    self.reader.require("=")
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 35, in require
    self.expect(self.read_char() == c, f"expected '{c}'")
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 361, in expect
    self.unexpected(message, location)
  File "/venv/lib/python3.10/site-packages/karapace/protobuf/syntax_reader.py", line 370, in unexpected
    raise IllegalStateException(f"Syntax error in {str(location)}: {message}")
karapace.protobuf.exception.IllegalStateException: Syntax error in :14:54: expected '='
kafka.coordinator       Thread-1    INFO        (Re-)joining group karapace-schema-registry
karapace.schema_reader  schema-reader   INFO        Replay progress (0.22): -1/0 (0 %)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.22): -1/0 (0 %)
karapace.schema_reader  schema-reader   INFO        Replay progress (0.21): -1/0 (0 %)
eliax1996 commented 1 month ago

It would be useful to solve/understand the problem to have the schema or an equivalent anonymized schema that once produced by karapace create this problem at startup. Without it its very hard to debug and create a unit test to replicate (and consequently fix) the problem

davegreen commented 1 month ago

OK, I don't have much experience with protobuf, but the message in the schemas topic that seems to cause the issue is this:

Key:

{
    "keytype": "SCHEMA",
    "subject": "Tests.Integration-value",
    "version": 1,
    "magic": 1
}

Value:

{
    "subject": "Tests.Integration-value",
    "version": 1,
    "id": 1,
    "schema": "syntax = \"proto3\";\npackage live.schemas;\n\nmessage ChannelMetadata {\n  string session_guid = 1;\n  string identifier = 2;\n  string description = 3;\n  string format_string = 4;\n  string unit = 5;\n  double minimum_value = 7;\n  double maximum_value = 8;\n\n  oneof _text_conversion {\n    optional .live.schemas.TextConversion text_conversion = 6;\n  }\n}\nmessage TextConversion {\n  repeated double numerical_values = 1;\n  repeated string text_values = 2;\n  string default_text_value = 3;\n}\n",
    "deleted": false,
    "schemaType": "PROTOBUF"
}

Headers:

{
    "X-Origin-Host": "0.0.0.0",
    "X-Registry-Version": "karapace-3.15.0"
}
eliax1996 commented 1 month ago

Seems like an issue with the current parser. The schema:

syntax = "proto3";
package live.schemas;

message ChannelMetadata {
  string session_guid = 1;
  string identifier = 2;
  string description = 3;
  string format_string = 4;
  string unit = 5;
  double minimum_value = 7;
  double maximum_value = 8;

  oneof _text_conversion {
    optional .live.schemas.TextConversion text_conversion = 6;
  }
}
message TextConversion {
  repeated double numerical_values = 1;
  repeated string text_values = 2;
  string default_text_value = 3;
}

Is not considered a valid protobuf file definition by the current parser.

But, if you run:

SCHEMA="""\
syntax = "proto3";
package live.schemas;

message ChannelMetadata {
  string session_guid = 1;
  string identifier = 2;
  string description = 3;
  string format_string = 4;
  string unit = 5;
  double minimum_value = 7;
  double maximum_value = 8;

  oneof _text_conversion {
    .live.schemas.TextConversion text_conversion = 6;
  }
}
message TextConversion {
  repeated double numerical_values = 1;
  repeated string text_values = 2;
  string default_text_value = 3;
}
\
"""
karapace.schema_models.parse_protobuf_schema_definition(parse_protobuf_schema_definition(
        schema_definition=SCHEMA,
        references=None,
        dependencies=None,
        validate_references=True,
        normalize=True,
    )

It works, the problem seems to be the optional in the oneof field not interpreted as a valid protobuf definition.

We are working on getting rid of this parser, this issue will be resolved by the removal of the current parser implementation

davegreen commented 1 month ago

That's awesome. Thanks for helping demystify it a bit!

eliax1996 commented 1 month ago

np, I think its a valuable issue to have. It shows an example of what we don't have supported in the parser now :) re-opening it! Thanks for the contribution, we will close the issue as soon as we replace the parser