FantasticFiasco / serilog-sinks-http

A Serilog sink sending log events over HTTP.
Apache License 2.0
173 stars 42 forks source link

Bookmark file is being used by another process #262

Closed murlidakhare closed 6 months ago

murlidakhare commented 2 years ago

We are frequently getting below error: Exception while emitting periodic batch from Serilog.Sinks.Http.Private.Network.HttpLogShipper: System.IO.IOException: The process cannot access the file 'F:\AuditLogs\LogBuffer.bookmark' because it is being used by another process. at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle) at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) at Serilog.Sinks.Http.Private.Network.BookmarkFile..ctor(String bookmarkFileName) at Serilog.Sinks.Http.Private.Network.HttpLogShipper.OnTick()

During the investigation, we found the post about setting bufferFileShared to True. However, this doesn't work. Could you please suggest a fix to this issue?

github-actions[bot] commented 2 years ago

Hi there and welcome to this repository!

A maintainer will be with you shortly, but first and foremost I would like to thank you for taking the time to report this issue. Quality is of the highest priority for us, and we would never release anything with known defects. We aim to do our best but unfortunately you are here because you encountered something we didn't expect. Lets see if we can figure out what went wrong and provide a remedy for it.

FantasticFiasco commented 2 years ago

I think we need to make your issue reproducible. Could you please create a GitHub repo with a small sample showcasing the issue? Please also provide the steps needed to get the exception.

julichan commented 2 years ago

Hello, this issue is rather problematic i hoped to do the same and ended with the same error. I noticed it even prevent storing any log for the application suffering the problem. While i don't have a shareable exemple, i reproduced it and can tell you how to reproduce it. I think this will happen in any type of project but i did reproduce it using two net4.8 applications hosted by iis. So yes two different applications are required. You may not reproduce it with two different threads of the same app since the http sinks would belong to the same process. You also need to setup selflog.enable() otherwise, you won't be able to see the error since it is the only way to obtain serilog's own logs but i guess you already know that. Have both applications use the same buffer directory otherwise there is no point in a shared buffer. Off course that means both apps have the targetted url. The first app will lock the bookmark file and the other application will be stuck with the error. The second app won't even be able to fill its buffers and lose all data. Finally i read some of the code althrough not completely. I noticed the shared buffer option is only provided to the inner sink but does not seem to be used by the httpsink itself. You may have to update your code based on this option to handle the bookmark file correctly.

FantasticFiasco commented 2 years ago

Hi @julichan! I'll see if I can reproduce the issue after the summer vacations. I hope you can work around the issue until then.

julichan commented 2 years ago

Thanks @FantasticFiasco, for this fix i wish to provide you further information. I believed the RollingFileSink had the same problem as the HttpSink but here's what i found during the investigation:

On our side, we found a temporary alternative to the http sink, it's the netstandsard rollingfilesink in net48 + filebeat but this involve a lot of extra configuration in filebeat and logstash. We hope to hear from you soon.

Perhaps you can handle the bookmark file through the sharedfile object too.

julichan commented 2 years ago

Hello, @FantasticFiasco, I went on and investigated further:

If you mind, i could provide a merge request myself if i have time before you attempt to make a fix. My proposal is to fix the portabletimer lock at dispose and to create an alternative portabletimer that can be used instead when a crossapp option is enabled in the setting. Or if you don't want a new option, i can just replace the portabletimer lock with an os mutex.

FantasticFiasco commented 2 years ago

Thanks for the added information. I'll take a look at this issue when the summer vacations are over here in Sweden. I'll definitely wan't to make sure that we are using the correct internal dependencies to the file sink.

julichan commented 2 years ago

Hello @FantasticFiasco, How are you? Just wanted to know if you had time to look at this. Just so you know, we couldn't wait for an update so we've used the RollingFileSink from the file sink repository which is working flawlessly alongside filebeat. However it would be so much better if we could use the Http Sink. Any idea when you can look at this?

FantasticFiasco commented 2 years ago

I'll try to reproduce this in this week or in next.

One thing that gets me confused is that it will require two different applications to reproduce the issue, is this correct? I don't think this sink has been built to support two different applications logging to the same buffer files. Although you mention that there is a problem here since the bookmark file is locked, another much greater problem is the fact that both applications will think that they are responsible for sending the log events over the network. That's one reason why you are successful using Filebeat, it's only running as one instance and it has a clear responsibility of sending the log events. Having two separate applications, sharing the same buffers files and bookmark file is like running two instances of Filebeat, one will surely fail or you will end up with duplicate log events on the log server.

julichan commented 2 years ago

Hello @FantasticFiasco ,

Haha, i'm quite sure you can also reproduce it with just one application and two instances of the sink sharing the same buffer althrough i haven't tested that.

You know i think that serilog shouldn't fail where log4net and nlog succeeds. The http sink shouldn't fail where the others succeeds either. Plus you should know that even for a single application, iis will run multiple instance of it so it should be supported even if only for this case.

It doesn't matter if multiple sources sends the log if the bookmark file is handled properly, the logs won't be sent as duplicates. :)

I'm trusting you on this one. I know you can do it. 👍 🥇

FantasticFiasco commented 2 years ago

...you can also reproduce it with just one application and two instances of the sink sharing the same buffer althrough i haven't tested that.

Again, I wouldn't know why you would do that, the sink hasn't been implemented with that use case in mind.

Regarding the comparison with log4net and nlog, I'm not in competition with either of them, nor do they dictate the requirements of this sink. Regarding usage in IIS, I'm no expert in IIS, most of my experience comes from working in cloud environments. Are you referring to worker threads in the app pool when you mention instances, or is instances something else?

janichirag11 commented 1 year ago

I think this could be due to multiple reasons, let me put them down one by one

  1. Do we have issues with multi-threading, as IIS creates multiple threads trying to access this file?
  2. Are any special permission that needs to be provided to the folder for reading, if yes then what are they?
  3. Is this because we are not disposing of the file stream for the bookmark file creating a locked bookmark file?
julichan commented 1 year ago

Hello and happy new year, Visited the code and saw at least the dependency on the file sink has been fixed. Cool. As you can guess 6 month waiting for a fix is not an option for a company so we've setup everything using filebeat. I don't know if we'll ever go back. Not any time soon anyway.

It's pretty easy to reproduce the bug but it seems you still need help here is the case: iis has a recycling process which works by creating a second instance of the web application (you can see two w3wp.exe at that time). Well in our case we notice that behaviour in prod only as it is probably happening too fast in dev but you can see the gist of it: two instances of an app means two instance of the sink with identical settings even for a short time and no way around this. You'll get easy time reproducing this by starting two instances of the same app rather than trying to reproduce the iis recycling. There are two problem in this sink:

Now this is way behind me but i also remember reproducing this using multiple instance of the sink in the same app. Anyway the sink was embedded in an async sink in case that helps.

We also have reasons to have multiple apps sharing the same sink: multiple iis web apps using same code and configuration with logs that had the same destination. Splitting the buffers is not an option as it would require reworking a large monolyth and large deployment code. We've tried work around but no easy solution would allow to resume sending unsent logs after recycling or app shutdown... Http sink has been then deemed unsuable for us.

Now you might still not consider this a use case. It's sad to get such an answer when all other sinks support this. I don't mean to stir a fight but Fiasco refused that i provide a MR and now i have other priorities too...

FantasticFiasco commented 1 year ago

@janichirag11, regarding your questions.

  1. Do we have issues with multi-threading, as IIS creates multiple threads trying to access this file?

Don't know. The test suite is not running in IIS, nor do I have any experience in hosting my applications there. But I would love a GitHub repository with a sample application showcasing the issue, with clear instructions on how to configure and host the application in IIS. With this level of detail I think I would be able to understand the use case a bit more.

  1. Are any special permission that needs to be provided to the folder for reading, if yes then what are they?

Are we talking about the folder hosting the buffer files and the bookmark file? If yes, then the application process will need write permissions to this folder.

  1. Is this because we are not disposing of the file stream for the bookmark file creating a locked bookmark file?

Maybe. The locking strategy selected for the buffer files and the bookmark file is not the same. Serilog.Sinks.File, the downstream dependency responsible for writing log events to buffer files, has support for multithreading, while the implementation in this codebase of the bookmark file hasn't. The reasoning behind this is that log events can be written from parallel threads, e.g. a web server handling parallel requests, while the bookmark file is only accessed from the log shipper that is responsible for sending log events over the network. The log shipper is acting as a singleton in the codebase, and is not designed for a situation where multiple instances of log shippers are sharing the workload of sending log events.

@julichan, regarding your comments.

As you can guess 6 month waiting for a fix is not an option for a company...

My comment from 6 months ago ended with me asking you a very specific question on the subject of IIS. I specifically told you that "I'm no expert in IIS...". I didn't get an answer from you, thus my investment in this issue stopped.

Let me try to explain. I don't know whether you've published open source. I have a couple of repositories here on GitHub that have gotten to a level where developers are asking me questions, filing issues and proposing feature requests. In these engagements I will try my best to provide answers. But the problem is that there's always more of you than there is of me 😃 We have an uneven relationship here. We're engaging in this issue, and you've made an investment in this specific issue (thanks!). I'm engaging in all interactions across all my repositories, while still maintaining dependencies, and sporadically implement new features. You will have to fight for my priority. The best way you do that is to optimize for the time I will have to invest. If you leave something open ended, if you force me to find my way to a reproducible sample, then your priority is downgraded. I don't think this is unfair, not to you and not to me. Especially when you, as you say, work for a company and get paid for doing this. Me? I don't get paid for providing this as open source. We don't share the same incentive for spending time on this. This is open source, you can fork my code, you can suite it to fit your needs. The license gives you this right!

...setup everything using filebeat. I don't know if we'll ever go back. Not any time soon anyway.

I am happy for you, you've solved your problem and can continue with focusing on the important parts of your business! I don't mind this at all. There is no benefit for me, direct or indirect, from you using this codebase. You should use it only if you find it valuable. Again, this is open source, you are free to use it and there is no downside for either of us if you don't.

It's pretty easy to reproduce the bug but it seems you still need help here is the case...

Yes, I definitely need help here 😄 Thanks for the explanation regarding recycling in IIS. But I would like to push back on the proposed solution. This sink has never been implemented to support the use case where multiple different applications are sharing the same buffer files. I don't think this is something that we will opt to support. However, IIS is a major player in the .NET ecosystem and if we have a problem with recycling processes it is something we might have to address. But the solution I would propose is one that would handle the recycling problem, where we handle the fact that two processes in a short time tries to access the same file. This solution seems much easier to tackle, using e.g. retries, than one where multiple processes for a long time share access to the buffer file. @janichirag11, would this be sufficient to you? In the that case it does, I would still need help setting this up in IIS. A sample application and instructions on how to reproduce the problem in IIS would be the trigger to have me invest more time in this.

janichirag11 commented 1 year ago

Multiple retries could be one solution. IMO, if we use some mutex or lock mechanism where the resource could not be reopened would help to solve the current problem. In the BookmarkFile class, if we can add a lock, then it will help if another thread is trying to access the bookmark file and will not allow it and will reduce the exceptions.

This will give better handling for multi-threading errors and will still solve the purpose of a singleton.

In regards to creating a sample application, let me try that and will post once I am done with it.

Happy Coding :)

FantasticFiasco commented 1 year ago

Thanks! Happy coding to you as well! 😄

janichirag11 commented 1 year ago

SerilogFlow

Here is the possible flow we might be dealing with... Serilog is configured for 2 sec and it takes time more than 2 seconds to get processed on the consumer side, and since we do not have locking mechanism we are getting exceptions

FantasticFiasco commented 1 year ago

Ok, if the issue we're running into is that attempt to ship the log events overlap, I think we're in a good situation. I'll see if I can create a test for this.

janichirag11 commented 1 year ago

And then we should avoid throwing exceptions in this case.

prasadpaul53 commented 8 months ago

Hello,

We're encountering a persistent issue on our production environment where we receive a large volume of error messages stating "The process cannot access the LogBuffer.bookmark file because it is being used by another process" during app pool recycling.

Here's a summary of the problem:

Error: Exception while emitting periodic batch from Serilog.Sinks.Http.Private.Durable.HttpLogShipper: System.IO.IOException: The process cannot access the file 'LogBuffer.bookmark' because it is being used by another process. at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options) at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize) at System.IO.File.Open(String path, FileMode mode, FileAccess access, FileShare share) at Serilog.Sinks.Http.Private.Durable.BookmarkFile..ctor(String bookmarkFileName) at Serilog.Sinks.Http.Private.Durable.HttpLogShipper.OnTick()

Occurrence: Frequent during app pool recycling.

Serilog.Sink.Http version: 8.0.0

Our investigation revealed: The issue is caused by app pool overlapped recycling. When the app pool is recycled, the previous process briefly holds onto the LogBuffer.bookmark file, causing access conflicts with the new process.

This happens according to the below steps:

  1. Process Hold on LogBuffer.bookmark: A specific process within the app pool retains the LogBuffer.bookmark file during app pool recycling.

  2. App Pool Shutdown Timeout: The app pool has a pre-defined timeout period for processes to complete tasks and shutdown gracefully before the recycling process commences.

  3. Process Exceeds Timeout: In this case, the process holding the LogBuffer.bookmark file exceeds the set shutdown timeout and is forcefully terminated by the app pool recycling mechanism.

  4. Concurrent New Process Access: While the old process is forcefully terminated, a new process from the recycled app pool attempts to access the same LogBuffer.bookmark file.

  5. Access Conflict and Error: Since the file is still locked by the remnants of the terminated process, the new process encounters an access violation, resulting in the "The process cannot access the LogBuffer.bookmark file" error.

Can you please implement the graceful shutdown to gracefully stop the ongoing process utilizing the logbuffer.bookmark file? Is there any other alternative?

FantasticFiasco commented 8 months ago

Hi @prasadpaul53.

I've started to analyze this on branch test/locked-bookmark-file. I'm creating two log event producers sharing the same buffer and bookmark file. I'm using the durable file-size rolled sink (extension method DurableHttpUsingFileSizeRolledBuffers). The only configuration I apply is setting bufferFileShared: true.

My observations are as follows:

Reflections?

prasadpaul53 commented 8 months ago

Hello @FantasticFiasco,

Great that you have started the analysis. Your observation regarding the locking of the bookmark file is absolutely correct.
My reflections are:

FantasticFiasco commented 8 months ago

The problem at heart is that we want to use the sink in two different use cases. The first one, originally in mind when creating the sink, was that the instance would terminate, and all log events ought to be sent before we terminate. That's why this line exists.

The other use case, which is relevant in many of the comments in this thread, is that the instance will restart. In this context the priority is to terminate quickly, and log events can remain on file since they will be sent after the process restart.

@prasadpaul53 does this sound correct?

prasadpaul53 commented 7 months ago

Yes, it is correct. @FantasticFiasco

FantasticFiasco commented 7 months ago

Release 9.0.0-beta.2 has now been published on nuget, please assert that the release fixes your issues. If it does, please report back and I'll create a new official release.

Thanks for being persistent in your feedback.

prasadpaul53 commented 7 months ago

Thank you for the update on the release. We will verify the fixes and provide our feedback soon.

FantasticFiasco commented 6 months ago

Just released this as an official version, you can find it on nuget.