Open superhx opened 1 month ago
@superhx Hey, after careful consideration of the current write model on my end, I've found that to retain the current parallel write model while implementing sequential callback semantics on top, it might be more appropriate to process it at the stage after the block has been written.
After the block is written to the WAL, consider serializing the process of writing the record to the logCache to reduce the complexity at the upper layer.
for example: com.automq.stream.s3.S3Storage#append0
This approach can avoid concurrency control of the callbackSequencer, while also decoupling business-related operations from I/O operations: since the current I/O thread is not only responsible for writing the block but also for handling the request callback processing chain;
Potential issues: Considering under a large number of requests, processing callbacks may lead to significant processing pressure, but for now, they are all pure memory operations. If there is concern about delays in ack responses, then it is actually possible to consider designing the callbackExecutor as a multi-threaded model, routing to threads in the object thread pool based on streamId,thereby ensuring the orderliness maintained by the stream. Alternatively, consider setting up an ack response thread pool, dedicated to handling ack requests upwards.
@CLFutureX Hi, I think you might have misunderstood @superhx's idea. In his description,
it shifts the responsibility of ensuring sequentiality to the upper layer, making the upper layer logic more complex
the "upper layer" mentioned here refers to S3Storage
rather than the caller of S3Storage
.
Specifically, you can see that due to the current non-sequentially callback, we have added a lot of complex logic in S3Storage
, such as WALCallbackSequencer
and WALConfirmOffsetCalculator
. This has made the S3Storage
code overly complex and difficult to maintain.
What we want to do now is to thoroughly refactor BlockWALService
to make it callback sequentially, thereby reducing the complexity of S3Storage
.
@CLFutureX Hi, I think you might have misunderstood @superhx's idea. In his description,
it shifts the responsibility of ensuring sequentiality to the upper layer, making the upper layer logic more complex
the "upper layer" mentioned here refers to
S3Storage
rather than the caller ofS3Storage
.Specifically, you can see that due to the current non-sequentially callback, we have added a lot of complex logic in
S3Storage
, such asWALCallbackSequencer
andWALConfirmOffsetCalculator
. This has made theS3Storage
code overly complex and difficult to maintain.What we want to do now is to thoroughly refactor
BlockWALService
to make it callback sequentially, thereby reducing the complexity ofS3Storage
.
Ye, this will be a big project
@superhx @Chillax-0v0 hey, please take a look
Continuing the current block design, it remains unchanged externally, but internally builds logical partitions, routing to different blockPartitions through streamid, thus achieving imperceptible to the outside.
Based on partition distribution: After a block is written, it is directly routed and distributed to different eventloop threads based on block partitioning.
Calculation of blockPartition's startOffset: Combining the current writing logic: writing based on the startOffset of the block, so during the distribution phase, it is necessary to calculate the startOffset for each partition.
How to calculate it? example: blockPartition1.startOffset = block.startOffset; blockPartition2.startOffest = block.startOffset +alignLargeByBlockSize(blockPartition1.size)
Parallel Writing: The event loop can perform parallel writing based on the existing writing model. Completion Notification: Writing does not necessarily mean that the actual writing is complete; it requires waiting for the IO Thread to execute the force method. Thus, the CompletableFuture object after writing is transferred to an internal queue for completed and pending notification items. In this way, an orderly parallel writing of streams is achieved at the upper layer.
Free IOPS Limit: To continue the limitation of IOPS, the IO thread is responsible for executing a force once per 1/3000 by default. (Further optimization is possible here: when the event loop writes, it updates a globally visible flag, and the IO thread decides whether to execute force based on the write flag during execution, to avoid empty force executions.) Notification to Upper Layers: How to notify the upper event loop to complete the corresponding CompletableFuture response after the force? Plan 1: Lock control, when the IO thread executes force, it prohibits the event loop from continuing to write. After writing is finished, notify the event loop group. After the event loop receives the notification, it traverses the completed response collection. Plan 2: Special task flag When the IO thread starts writing, it inserts a special object into the completed notification collection of each event loop. After the force is completed, notify the event loop that the writing is finished. So, after receiving the notification, the event loop starts to traverse the completed notification collection until it encounters the special object. In this way, the notification is completed based on the task flag. With this, the parallel writing and ordered return of WAL is completed.
● Achieved orderly writing based on partitioning. ● The entire process is lock-free. ● Clear division of responsibilities and well-defined boundaries at process nodes: Writing operations, event loops, and I/O threads each take on independent responsibilities with clear functional boundaries, and there is no interference between them (for example the current interference issues between writing operations and polling operations). ● It is simple and easy to maintain. Following this, we can then eliminate the current WALCallbackSequencer class.
Problem
Currently, BlockWALService persists data blocks in parallel, responding directly to the upper layer with success as soon as any data block is persisted, even if the previous data block has not completed persistence. Although this method provides "better" write latency, it shifts the responsibility of ensuring sequentiality to the upper layer, making the upper layer logic more complex.
Expectation
Implement a
SequetialBlockWALService
:BlockWALService
on Aliyun ESSD PL1 20GB 120MiB/s 2800 IOPS with write concurrency 8/16/32/64;BlockWALService
so that it can be directly replaced in the future;