Aiven-Open / karapace

Karapace - Your Apache Kafka® essentials in one tool
https://karapace.io
Apache License 2.0
450 stars 68 forks source link

Corrupt schema, different schemas with the same id found #876

Open javierarrieta opened 3 months ago

javierarrieta commented 3 months ago

What happened?

We are experiencing incompatible schemas being registered under the same schema id quite frequently

What did you expect to happen?

incompatible schemas should have different ids or fail to register if for the same subject (not the case, these are different subjects)

What else do we need to know?

Version of karapace: 3.10.4

We are experiencing quite frequent issues of corrupted schemas in our deployments that we haven't been able to pinpoint yet to a root cause. The issue is that at some point a non compatible schema gets assigned an already present schema id, thus breaking the deserialization of the already existent records using that schema. For instance we have a record schema with id 11 and then after a few months a message with schema string gets inserted as schema 11, making the messages with the previous schema unrecoverable unless doing some manual intervention.

Today I tried to dig deeper in the issue and try to recover the schemas by deleting the subject that corrupted the schema.

So I accessed the topic _schemas, listing all the schemas with schema id 11 (that is the one that got corrupted), names and values are redacted:

key value
{"keytype":"SCHEMA","subject":"corrupt-subject","version":1,"magic":1} {"subject": "corrupt-subject","version": 1, "id": 11, "schema": "\"string\"", "deleted": false}
{"keytype":"SCHEMA","subject":"subject-1","version":1,"magic":1} {"subject": "subject-1","version": 1,"id": 11,"schema": "","deleted": false}
{"keytype":"SCHEMA","subject":"subject-2","version":1,"magic":1} {"subject": "subject-2","version": 1,"id": 11,"schema": "","deleted": false}
{"keytype":"SCHEMA","subject":"subject-3","version":1,"magic":1} {"subject": "subject-3", "version": 1,"id": 11,"schema": "","deleted": false}
--- ---

Then we deleted the corrupt entry with:

❯ curl -X DELETE 'http://karapace-url/subjects/corrupt-subject'

And we got another entry in the _schemas topic, I presume to mark the subject as deleted:

key value
{"keytype":"DELETE_SUBJECT","subject":"corrupt-subject","magic":0} {"subject": "corrupt-subject","version": 1}

So we checked the versions for schema 11 using the API and all looks good:

❯ curl 'http://karapace-url/schemas/ids/11/versions' | jq -r
[
  {
    "subject": "subject-1",
    "version": 1
  },
  {
    "subject": "subject-2",
    "version": 1
  },
  {
    "subject": "subject2",
    "version": 1
  }
]

❯ curl 'http://karapace-url/subjects/subject-1/versions/1'
{"id":11,"schema":"<complex schema redacted>","subject":"subject-1","version":1}

❯ curl 'http://karapace-url/subjects/subject-2/versions/1'
{"id":11,"schema":"<complex schema redacted>","subject":"subject-2","version":1}

❯ curl 'http://karapace-url/subjects/subject-3/versions/1'
{"id":11,"schema":"<complex schema redacted>","subject":"subject-3","version":1}

But when I try to retrieve schema 11 I get the corrupt entry again:

curl 'http://karapace-url/schemas/ids/11' | jq -r
{
  "schema": "\"string\""
}

And indeed if I try to obtain the subject that I deleted I get an error:

❯ curl 'http://karapace-url/subjects/corrupt-subject/versions/1'
{"error_code":40401,"message":"Subject 'corrupt-subject' not found."}

I am not sure if we are doing some forbidden workflow, any help how to remove the corrupt schema (I know that all the entries for corrupt-subject won't be deserializable anymore) would be appreciated and also how we can debug the creation of corrupt entries, that are happening very frequently in our setup.

Thanks!