AutoMQ / automq

AutoMQ is a cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency.
https://www.automq.com/docs
Other
3.81k stars 219 forks source link

[BUG] Data corruption when reusing an unreleased buffer in S3 WAL #1944

Closed ShadowySpirits closed 2 months ago

ShadowySpirits commented 2 months ago

The FixedSizeByteBufPool introduced in #1874 required returning the buffer to the pool after use. In the s3 wal, we return the buffer after completing the append future.

https://github.com/AutoMQ/automq/blob/738ae15777a2630a09e03eea2e6c79e0160ecf1d/s3stream/src/main/java/com/automq/stream/s3/wal/impl/object/ObjectWALService.java#L90-L98

But with the fast retry enabled, the data buffer may remain in the object storage after the write future is completed.

https://github.com/AutoMQ/automq/blob/738ae15777a2630a09e03eea2e6c79e0160ecf1d/s3stream/src/main/java/com/automq/stream/s3/operator/AbstractObjectStorage.java#L254-L263

The write future may be completed after line 257 and before line 263, which allows the data buffer to be reused before fast retry. However, if this buffer is reused in another append operation, the fast retry function may write corrupted data.