Open 1559924775 opened 2 years ago
@1559924775 - are you able to make this plain markdown? It'd be easier to review, searchable within github, and wouldn't require downloading a pdf. Thanks.
@michaeljmarshall OK.
The issue had no activity for 30 days, mark with Stale label.
The issue had no activity for 30 days, mark with Stale label.
Motivation
In the scenario where Pulsar publishes data, how does Pulsar ensure that data continues to be sent from the previous location when the client crashed? To achieve effectively-once, the data source must be replayable first. For example, let's say we are reading records from a file and publishing a message for each record we read. If this application crashes and restarts, we want to resume publishing from record next to the last successfully published record before the crash. Pulsar provides a deduplication method based on sequenceid. For this, we could use the record offset in the file as the sequence ID and, through that ID, recover which offset we need to read from after the crash:
Now, let's consider the requirement that there are multiple files. We need to read the records in these files and publish them evenly to a multi partition topic. When the program crashes, we need to know which file and record we published last time? Therefore, we need to map these two data attributes to sequenceid. Let's go further. When users need more data attributes to describe the location of data, users need to map more data attributes to sequenceid. We find that there are some difficulties that make us unable to meet this demand. We list the problems as follows:
1.Users need to map multiple data attributes to sequenceid every time they publish, which may be a time-consuming operation, may have performance problems, and it is difficult to map multiple data attributes to a monotonically increasing 64 bit ID
2.Current producer Getlastsequenceid() can only get the maximum sequenceid of successful publishing of each partition. In the case of multi partition and asynchronous publishing, we can't guarantee that the data of a fileoffset is published successfully, and all previous data are published successfully. Suppose that a record of a smaller fileoffset in a partition is not sent successfully, The larger fileoffset in the other partition was sent successfully, and we restarted from producer If the fileoffset corresponding to getlastsequenceid() continues to be published, data will be lost.
Goal
In order to solve the above problem,We propose a solution:publish with progress.The following changes will be made:
We abstract a Progress interface for users to implement. For example, in the above example, the Progress class implemented by the user needs to include filePath and fileOffset. Each piece of data published by the user needs to carry a Progress object that identifies the piece of data. The new publishing interface will be as follows: sendSync(T msg, Progress progress). We maintain the mapping relationship between the latest sequenceId and the Progress object that has been successfully released inside the sdk, and regularly check this mapping relationship for use in recovery. In this way, the user only needs to carry the Progress when sending, and when recovering, read the minimum publishing progress of each partition from the checkpoint and continue to send.
API Changes
Let's introduce some concepts first:
Progress:
The Progress interface is implemented by the user. By implementing the fields defined in the class, the user can locate a unique piece of data, and use this to locate the next data to be sent. In order to better understand the Progress interface, we give two examples: For example, the log collection service collects data from multiple log files and sends them to topics. Progress can be implemented as follows:
These three fields can uniquely identify the location of a piece of data. After restarting, the information and the corresponding sequenceId can be obtained from the checkpoint (vide infra) and can continue to be sent.
For example, a scenario of streaming Computing: subscribe data from a topic A and publish it to topic B. progress can be realized as follows:
ProgressInfo:
It is not exposed to users. It contains the fields of progress and sequenceid, which are used to associate progress and sequenceid.
Progress persistence:
Start the checkpoint thread in the producer, and periodically save the latest published Progress and sequenceId of each partition through the saveProgressInfo of the ProgressInfoStore interface. By default, we provide a way to persist to files, and users can also implement the ProgressInfoStore interface to customize the persistence method. In addition, users can customize the save method through the binary data of ProgressInfo returned by the producer's saveProgressInfo interface.
Progress recovery:
The progress information is loaded into the memory from the persistent place, including the Progress and sequenceId of the last save of each partition, and then set to the initSequenceId of each partition. The user can implement the ProgressInfoStore interface to customize the loading method, and the user can also directly pass in a binary data through the producer's loadProgress(ProgressInfo progressInfoByte) to restore.
1.ProducerConfigurationData Changes Add the configuration item needProgress to identify whether to enable the publishing mode with progress. progressCheckpointIntervalSecondes configures the persistence period of progress information, which will be persisted to the local file by default. Users can implement the custom persistence method of ProgressInfoStore.
2.Add Progress interface It is implemented by the user adding the required fields, representing the progress information of the user data, and needs to implement the compareTo, deserialize, serialize interfaces.
3.Add ProgressInfo Including two members, Progress and sequenceId, this object is stored when doing checkpoint.
4.Add ProgressManager A class for managing progress-related operations, where lastProgressPublished represents the progress and sequenceId of the successfully published msg, and lastProgressPushed represents the progress and sequenceId of the msg that has been sent but has not yet received an ack.
5.Add ProgressInfoStore interface. The user implements the persistence method of binary data obtained by calling producer.saveProgress(). We provide the default implementation of ProgressInfoStore, FileProgressInfoStore, which is persisted to local files.
6.Add ProgressMessageImpl On the basis of messageimpl, add the member variable progress.
7.Producer interface changes.
8.PartitionedProducerImpl and ProducerImpl changs New method that implements the Producer interface. ProducerImpl adds ProgressManager members. PartitionedProducerImpl will traverse and call the methods of each partition producer to implement its own loadProgress, saveProgress and getMinProgress, and they will eventually call the corresponding methods of ProgressManager. ProducerBase:
PartitionedProducerImpl:
ProducerImpl:
9.TypedMessageBuilderImpl and TypedMessageBuilder interface changes Add progress members. Modify getMessage and return the ProgressMessageImpl object if publishing with progress is configured. TypedMessageBuilder:
TypedMessageBuilderImpl:
10.Add ProgressSendCallback Add progress members based on SendCallback
11.OpSendMsg class changes Add the lastCallback field, so that it is convenient to get the sendCallback of the last data in the entry when the ack is received.
12.Add ProgressPartitionMessageRouterImpl Since each partition maintains the distribution progress independently, it is necessary to ensure that the same data can be sent to the same partition every time.
Implementation
In order to let us understand its principle and implementation details more clearly, let's look at a specific example, which includes the process of publishing, client crashing, and restarting to continue publishing.
Demo
If you need pdf. here: A.plan.for.client.to.realize.effective-once.pdf