`bufferSizeLimitBytes`'s docs might be misleading/outdated

EndOfTheSpline commented 2 months ago

The configuration of the Seq sink for bufferSizeLimitBytes states:

https://github.com/datalust/serilog-sinks-seq/blob/b3069e56634d0cece9a2cd9e798806be5daccd6e/src/Serilog.Sinks.Seq/SeqLoggerConfigurationExtensions.cs#L53-L54

From my own quick testing and looking at the code, I feel like this isn't the current behaviour however. The sink is hardcoding the maximum file size to 100M

https://github.com/datalust/serilog-sinks-seq/blob/b3069e56634d0cece9a2cd9e798806be5daccd6e/src/Serilog.Sinks.Seq/Sinks/Seq/Durable/DurableSeqSink.cs#L57

and uses the supplied value only for the clean up in the file sets

https://github.com/datalust/serilog-sinks-seq/blob/b3069e56634d0cece9a2cd9e798806be5daccd6e/src/Serilog.Sinks.Seq/Sinks/Seq/Durable/FileSet.cs#L74

which I think means the individual files can, temporarily, still grow to 100M, even if they get removed the next day.

This is sadly making it a bit hard to use it for my use case (little local storage available + device that logs a lot during a day when online, but only little when offline + I'd like to know what happens when it's not online, but not necessarily the entire message), so I might have to go and tinker with some own offline-log-shipping mechanism, which I would really like to avoid.

nblumhardt commented 2 months ago

Thanks for raising this. For the sink to work, the total retained file size needs to be larger than this "chunk size", since there's no way for data within a file to be cleaned up once it's shipped: the file needs to be rolled in order for any cleanup to occur.

It sounds like the chunk size might be too small for your use case - what kinds of sizes are you targeting, in what sort of deployment environment? Thanks!

EndOfTheSpline commented 2 months ago

I'm deploying the application on some remote IoT devices, which have only a few GB of available storage at most, and they need those for other tasks - plus I'm trying to conserve the flash memory because it's soldered on, i.e. keeping writes to a minimum. That criteria would kind of disqualify this solution, but temporarily - for the issue next - it would be okay.

I'm experiencing that the devices reboot (which is to be expected), but then can't connect to the internet for whatever reason, until manually rebooted on-site. Sometimes they do manage to connect to Seq just fine (and then upload the data as expected); sometimes they stay dead. It's this case that I would be most interested in getting logs from - to try to figure out why it's not connecting. I've been thinking of making some kind of monstrosity, where my root logger logs to an in-memory logger, which I can then (on-demand) query and dump to disk (e.g. when figuring out that a connection likely isn't going to happen), and if I do manage a connection, dispose the in-memory logger and send all previously stored chunks to the server somehow. I'm not entirely sold on this idea though.

As for the expected chunk size... When connected to the internet, it can generate a lot of traffic (which is stored on Seq's side due to the aforementioned limitations - cloud space is cheaper than local storage in this case). When it's not able to connect, however, it's generally logging very little - so I think a chunk size of perhaps a few MB at most would suffice.

What I would really want/need (I think) is something that isn't quite as durable as this option, by basically persisting only the unsent events to disk. If a smaller batch is lost because the application was killed somehow, then that would be acceptable in this case - disk usage is more important/limiting.

nblumhardt commented 2 months ago

Thanks for all the extra info 👍

The sink project is actually pretty tiny by library standards, and it's a typical CSPROJ without any special build steps... I wonder if you might get the best result just grabbing a copy of the source, tweaking the bits you need to adjust, and building it as part of your own solution?

EndOfTheSpline commented 2 months ago

I didn't think of that, but you're right, it's a smaller project - so it could work. I suppose I would have to change the durable sink to instead of always writing to a file trying to ship it to the API first, and if that fails, write it to disk - that way I could probably re-use a decent chunk of the current code.

I'll give it a try, thanks!

nblumhardt commented 2 months ago

Hope all of this came together nicely! I'll close it as I don't think there's anything currently to do at our end, but please drop us a line if you still need a hand.

datalust / serilog-sinks-seq

`bufferSizeLimitBytes`'s docs might be misleading/outdated #227