Farfetch / kafkaflow

Apache Kafka .NET Framework to create applications simple to use and extend.
https://farfetch.github.io/kafkaflow/
MIT License
638 stars 114 forks source link

[Bug Report]: BatchProduceAsync seems to produce items out or order #594

Open AlexeyRaga opened 3 weeks ago

AlexeyRaga commented 3 weeks ago

Prerequisites

Description

It looks like BatchProduceAsync can produce message out-of-order. It happens from time to time, and the order doesn't seem to be deterministic. Most of the time it does seem to produce messages in the right order, but not always.

Steps to reproduce

I was using this code to reproduce:

private sealed record Batch(Guid Id, List<ISpecificRecord> Messages);

private static Batch CreateBatch()
{
    var id = Guid.NewGuid();
    var messages = new List<ISpecificRecord>
    {
        new UserJoined { id = id, order = 1 },
        new UserDeactivated { id = id, order = 2 },
        new UserActivated { id = id, order = 3 }
    };

    return new(id, messages);
}

[Fact]
public async Task Should_preserve_order_of_messages()
{
    var batchesToSend = Enumerable.Range(0, 10).Select(_ => CreateBatch()).ToList();

    foreach (var batch in batchesToSend)
    {
        var toProduce =
            batch.Messages
                .Select(x => new BatchProduceItem(
                    fixture.TopicName,
                    batch.Id.ToString(),
                    x,
                    new MessageHeaders()))
                .ToList();

        await fixture.Producer.BatchProduceAsync(toProduce);
    }

    // assertion is omitted for brevity 
}

I then look at the topic itself and see that some batches come out of order.

Expected behavior

I expect that the messages order is guaranteed by the batch producer all the time.

Actual behavior

Most often messages are coming in the right order: 1, 2, 3 but sometimes they are written in the wrong order.

KafkaFlow version

v3.0.10

AlexeyRaga commented 3 weeks ago

Is it possible that the problem can be in how middlewares are handled?

I see that BatchProduceAsync executes Produce synchronously for each message, separately.

But it internally it does _middlewareExecutor.Execute which returns Task. This Task is dropped on the floor. This could mean that multiple Tasks are just running in parallel, calling the "real" producer deep inside, and by chance can finish out-of-order.

If I am right, then this execution model is unsound.

massada commented 1 week ago

That is exactly why is not guaranteeing in order producing. For it to guarantee in order producing, the code would need to wait until InternalProduce is called (this method pushes the message to the driver buffer).

It should be very easy to reproduce this issue by adding a producing middleware with a descending timeout and see an inverse order of producing.

Let's say 3 messages with descending timeouts of 1s, 500ms, 0s.