Use size-based batch limiting

nblumhardt commented 2 years ago

On the server-side, Seq imposes two size limits on incoming events:

The event size limit, which applies to the JSON-encoded byte size of individual events, and
The batch size limit, which applies to the total size (including newlines) of the CLEF-encoded POST body.

On the client side, this sink currently uses an event size limit, but batches are limited by count (batchPostingLimit) rather than byte size. This makes it difficult to reliably align the client and server settings, leading to potential batch loss when oversize batches are discarded, or poor batch utilization if the batch size has to be set to an excessively conservative low value.

A future version of this sink should replace batchPostingLimit with a size-based control.

The main caveat that will complicate this is the dependency on PeriodicBatchingSink, which implements the current count-based logic. The dependency on PeriodicBatchingSink will likely need to be broken and the functionality replaced in order to make the proposed change here.

spointeau commented 4 months ago

Yesterday, I went into this issue of the batch limit reached on the seq server side. It is quite easy to restrict thè size per message, but impossible to control the batch size, except if we send 1 by 1 message, which I think is completely inefficient.

I could see that version 4 of serilog removed PeriodicBatchingSink, maybe this makes this CR possible now?

nblumhardt commented 4 months ago

Definitely keen to explore this again; thanks for the nudge.

spointeau commented 2 months ago

Do have by chance any feedback for this enhancement ? (this becomes quite important for us.)

nblumhardt commented 2 months ago

Hi!

We've looked back into this a couple of times, but we're not really happy with any of the implementation options at present.

Even duplicating the batching infrastructure to add this natively, there are some significant performance considerations to look at because the buffer between foreground and background threads will need to carry large string allocations that the sink can avoid making as things stand today.

If an additional batchSizeLimitBytes argument were added today, I think the best we could do would be to break any oversize batches into chunks, and to send each chunk in a separate request. Once the first chunk in a batch was sent successfully, we'd have to ignore failures trying to send any later chunks, otherwise we'd risk duplicating logs (the Serilog batching infrastructure doesn't support partial retries).

This could be a stepping stone to a better solution in the future - would it move things forwards enough for you for this to be worthwhile, @spointeau?

spointeau commented 2 months ago

Hi,

I definitely cannot agree on potentially loosing logs. The best approach is to change the logic when making the batches: in addition to the counting, the batch total size should be less or equal than the specified size. I did not look in code specifically but I don't see any other option. Do you think it is feasable?

nblumhardt commented 2 months ago

Thanks for your reply. It's definitely not out-of-the-question, but I don't think there's a clear/straightforward way to proceed right now. Perhaps forking, trimming down, and modifying the sink to create an experimental/companion implementation would be a possibility?

spointeau commented 2 months ago

Hi, When you propose of forking, trimming down etc, is it for me to have a look right ? Please could you just point me on the source file(s) where I can (try to) understand this logic of batching to the background thread? I am not sure if I will be helpful to you but I will try to better understand the whole change. I will do my best in any case.

nblumhardt commented 1 month ago

Thanks! It's in https://github.com/serilog/serilog/blob/dev/src/Serilog/Core/Sinks/Batching/BatchingSink.cs#L104, which this sink uses via https://github.com/datalust/serilog-sinks-seq/blob/dev/src/Serilog.Sinks.Seq/Sinks/Seq/Batched/BatchedSeqSink.cs

spointeau commented 1 month ago

Thank you very much, I had a look and I definitely think that the change should happen in BatchingSink.cs. I will try to submit you a basic prototype asap. The only thing (for now) that I am missing is how to calculate the size of an event (for the batch to not exceed a specified size)?

nblumhardt commented 1 month ago

Hi! That's the tricky part, and why I think it'd need to be built as a separate implementation for the time being :+1:

Since determining the event's size depends on its UTF-8 encoded, Seq-formatted representation, I think the trick would be to:

Copy BatchingSink into the Seq sink project
Modify it internally so that the batches are lists of encoded byte[]s
Use the Seq formatter to construct and measure those byte arrays while building batches (controlling batch size)
Modify the batching Seq sink to accept IReadOnlyCollection<byte[]> batches instead of the current LogEvent-based ones

To make the whole process easier, deleting the other Seq sink variations from the forked repo would save having to update them (the "trimming down" bit).

spointeau commented 1 month ago

I guess we are in a catch-22, because the batching is made on serilog but doing it by the size depends on the seq sink formatter. I really missed the elephant in the room. So there is no other choice than to do the sub batching in the seq sink, as you proposed initially.

I am still missing the following: How you do retry if the ingest is failing ? Could you explain a bit more the foreground and background threads ? is the batching done on the foreground thread and the seq sink on the background thread ? I don't see it clearly.

nblumhardt commented 1 month ago

:+1: either that, or copy the batching code from Serilog into the Seq sink and modify it there (using this instead of Serilog's native batching). I'm not so keen on maintaining another fork of the code, and the result would have different performance characteristics, but spiking it out and testing/profiling might provide some data to push us one way or the other.

https://github.com/serilog/serilog/blob/dev/src/Serilog/Core/Sinks/Batching/BatchingSink.cs#L120 is the key to how retries work; the _currentBatch collection isn't cleared when a batch fails, so the next time around the loop (after the inter-batch delay) the same events will be tried again.

datalust / serilog-sinks-seq

Use size-based batch limiting #172