Open Denovo1998 opened 1 year ago
@poorbarcode @codelipenghui @rdhabalia PTAL!
I came up with such a new idea to solve the problem of schema ledger loss.
Now I'm in org.apache.pulsar.broker.service.schema.SchemaServiceTest#testSchemaLedgerLost
tested tryCompleteTheLostSchemaLedger()
is no problem, new producers and consumers work.
But first we need to talk about how we get SchemaVersion
and SchemaData
in the SchemaRegistry
. See if two of the Solution
and Alternatives
are feasible, or do you have any other good suggestions?
The code is in #20415 (some work is not done).
The issue had no activity for 30 days, mark with Stale label.
Waiting to discuss whether this plan is feasible. I will send an email to discuss it later.
The issue had no activity for 30 days, mark with Stale label.
In the alternative, the implementation is updated. Needs to be discussed.
Search before asking
Motivation
17221 describes an environment when multiple bookie copies are corrupted, or a Ledger has been deleted. The loss of schema ledger results in new producers and consumers not even being created and working properly.
According to the solution of PR #18010, enable
autoSkipNonRecoverableData
and skip has gotten lost schema can lead to the schema information is not complete. And in the existing code, schema corruption will delete the metadata. https://github.com/apache/pulsar/blob/a953027aad38c9f54e952133949280ec2f4c04e8/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryServiceImpl.java#L564-L570 If an error is not recoverable will be deleted, but PR #18010 and #19882 has been makedNoSuchLedgerExistsOnMetadataServerException
also as recoverable exception.So we need a solution that does not just skip the schema with the missing ledger, but actually supplements the broken schema ledger.
Solution
A new method called
tryCompleteTheLostSchemaLedger
. When the schema ledger losted, if the new consumer subscription or a new producer created, when there is a "Failed to open gotten" such an error, calltryCompleteTheLostSchemaLedger
method.This method attempts to create a new ledger save schemaData and then update the new ledger id to the metadata. Now, connected producers and consumers can work even if scheme ledger is deleted. To get the SchemaData, we need to store the SchemaData and SchemaVersion information in the topic(
org.apache.pulsar.broker.service.AbstractTopic
). When callingtryCompleteTheLostSchemaLedger
incoming.Alternatives
org.apache.pulsar.broker.service.Producer
andorg.apache.pulsar.broker.service.Consumer
do not save SchemaData and SchemaVersion, and only calltryCompleteTheLostSchemaLedger
through the admin api. Perhaps we should directly implement this function on the upload schema function(https://pulsar.apache.org/docs/3.2.x/admin-api-schemas/#upload-a-schema), then we need to pass in an additional flag to identify whether to register or make up for the missing schema. Of course, for compatibility, the default behavior should be to register a new schema.Store the SchemaData and SchemaVersion information in theorg.apache.pulsar.broker.service.Producer
andorg.apache.pulsar.broker.service.Consumer
that are connected or subscribed to the topic on the broker side.(Not an overall alternative, only contains how to storeSchemaData
andSchemaVersion
that have been lost)Anything else?
Please pay attention to the alternatives and leave your ideas for discussion. I will modify the implementation in pr.
Are you willing to submit a PR?