adams85 / filelogger

A lightweight yet feature-rich file logger implementation for the Microsoft.Extensions.Logging framework.
MIT License
147 stars 22 forks source link

Q: Can't understand LogLevel settings. #11

Closed bernatgy closed 3 years ago

bernatgy commented 4 years ago

Hi!

This is just a question, or maybe an enhancement for the docs if it's not just me.

First of all, thank you for this library! Good job! I've been using it for about 6 months, had no problem!

Context: Library Version: 3.0.0 (I'll probably update soon...) Target framework: .NET Core 3.0 (Upgrade coming as well...) It's a complex ASP.NET Core project, with background tasks etc.

So the one thing I've been messing around is setting up my logs to be "just perfect". I have some Global LogLevel settings:

"Logging": {
    ...
    "LogLevel": {
      "Default": "Trace",
      "Microsoft": "Warning",
      "Microsoft.Hosting.Lifetime": "Information",
      "NHibernate": "Debug"
    },
    ...
}

Then I have a Console provider which is most probably irrelevant. And lastly I have my File provider settings:

"File": {
      "BasePath": "logs",
      "FileAccessMode": "KeepOpenAndAutoFlush",
      "MaxFileSize": 10485760,
      "CounterFormat": "000",
      "IncludeScopes": true,
      "MaxQueueSize": 1000,
      "Files": [
        {
          "Path": "app-persistence-<counter>.log",
          "MinLevel": {
            "Default": "Trace",
            "NHibernate": "Debug",
            "NHibernate.Type": "Information",
            "NHibernate.Cfg.XmlHbmBinding": "Warning",
            "NHibernate.Cfg.XmlHbmBinding.Binder": "Information",
            "Itg.Persistence": "Trace"
          }
        },
        {
          "Path": "app-<counter>.log",
          "MinLevel": {
            "Default": "Debug",
            "Microsoft": "Warning",
            "Microsoft.AspNetCore.Authentication": "Debug",
            "Microsoft.AspNetCore.Authorization": "Debug",
            "Microsoft.AspNetCore.Server": "Debug",
            "NHibernate": "None",
            "Itg.Persistence": "None",
            "Itg.Services": "Debug",
            "Itg.Services.Shop": "Information",
            "Itg.Mailing": "Information"
          }
        }
      ]
    }

The thing I can't seem to understand is the following: I'd like my app-persistence-\<counter>.log to contain only NHibernate and Itg.Persistence categories, the other one mostly everything else.

On my initial run I had "Default": "None" on my persistence file, which I thought must be right, because I didn't want anything else to land in there, only the ones I specified (and more specific ones!).

Now based on these, after reading through Microsoft's logging tutorial, and this repo's readme, the category "Itg.Persistence.Secondary" should have been written to the persistence file, but it did not! (it's "under" "Itg.Persistence" which I set to "Trace")

The logging worked, as I could find my entries on the console, but nowhere else. To be exact, this is the line I was looking for: crit: Itg.Persistence.Secondary.FileKeyValueStore[LoadFailed] And to be more exact, this line is logged by a queued background task!

Now I set "Default": "Trace" on my persistence file, and here I thought, now everything should show up. Not just the ones I specified. To my surprise the one I was looking for DID show up ("Itg.Persistence.Secondary") BUT not at all everything... This is the only new category in my persistence log. Which is just about the effect I wanted, but I can't imagine these settings are any good...

Could you help me out please? What am I missing? I'm so confused... :(

bernatgy commented 4 years ago

I must have forgot to delete the log at one point. Correction; With the whole configuration I copied in, stuff did show up, just not the Secondary category. (still can't understand it... :/ )

adams85 commented 4 years ago

Hi!

First of all, thanks for the kind words. :)

I think your understanding of log level settings are fundamentally correct. There's one important thing to keep in mind. Log level switches defined at log file level (MinLevel option) can only make the log level switches defined at higher levels (global level or log provider level) stricter. This is mentioned in the docs as well:

[...] this is a second-level filter which can only tighten the rules of the first-level LogLevel filters (as it can only filter messages that hit the logger provider).

This is a limitation of the Microsoft.Extensions.Logging framework I cannot do anything about as it's out of my control. However, in your case this cannot cause any problems at all (theoretically).

That is, your configuration should work. I put together a simplified imitation of your environment based on your description but I failed to reproduce the issue. It just works as expected.

Please take a look at my attempt and try to figure out if there's some relevant differences between my simulated and your real environment which I don't see.

bernatgy commented 4 years ago

Hi! I will take a look, I promise! Thank you! Just letting you know I saw this. I will edit this post before monday (25. May)!


Sorry for the delay! So, I've tested it more and I've noticed something strange. This is probably not the main issue, but this might be the one that confused me most initially.

If I edit the log settings during runtime, the framework will reload it. (reloadOnChange: true) I've been changing the "NHibernate": "Debug" and when it reloads it spits the Itg.Services and Itg.Persistence.Secondary messages into the persistence file! Then, if I clean the logs and restart, it continues normally.

At this point I just wanted to see "something happening". I do know that "NHibernate": "Trace" really has no effect because of my global setting.

  1. Reloading with NHibernate: Trace -(set to)-> None Itg.Services and Itg.Persistence.Secondary land in Persistence file. (as they should, if only because of "Default": "Trace") (Without request, so it's previous ones. Their stamps are also few minutes old.)

  2. Clear and restart with None Only NHibernate.Cfg.XmlHbmBinding Information entries. Normal from the point of NHibernate, but no "Itg.Persistence.*" entries.

  3. Reloading with NHibernate: Trace --> Information Itg.Services and Itg.Persistence.Secondary land in Persistence file. (Without request, so it's previous ones. Their stamps are also few minutes old.)

  4. Clear and restart with Information Only NHibernate Information logs. Normal from the point of NHibernate, but no "Itg.Persistence.*" entries.

  5. Reloading with Warning Itg.Services and Itg.Persistence.Secondary land in Persistence file. (Without request, so it's previous ones. Their stamps are also few minutes old.)

  6. Clear and restart with Warning Only NHibernate.Cfg.XmlHbmBinding Information entries. Normal from the point of NHibernate, but no "Itg.Persistence.*" entries.

So it seems they get stuck somewhere? :/

"Log level switches defined at log file level (MinLevel option) can only make the log level switches defined at higher levels (global level or log provider level) stricter." Thanks for the reminder about this one, but I knew it. That anomaly must have made it's way to my config when I started to notice something. :)

Now I thought "MaxQueueSize" has something to do with it, because NHibernate can log almost 10k lines per request. So I simplified everything. I removed my global LogLevel settings, set the Queue to 10k and am using only this:

"File": {
      "BasePath": "logs",
      "FileAccessMode": "KeepOpenAndAutoFlush",
      "MaxFileSize": 10485760,
      "CounterFormat": "000",
      "IncludeScopes": true,
      "MaxQueueSize": 10000,
      "Files": [
        {
          "Path": "integra-persistence-<counter>.log",
          "MinLevel": {
            "Default": "None",
            "NHibernate": "Information",
            "NHibernate.Type": "Information",
            "NHibernate.Cfg.XmlHbmBinding": "Warning",
            "NHibernate.Cfg.XmlHbmBinding.Binder": "Information",
            "Itg.Persistence": "Trace"
          }
        },
        {
          "Path": "integra-<counter>.log",
          "MinLevel": {
            "Default": "Debug",
            "Microsoft": "Warning",
            "Microsoft.AspNetCore.Authentication": "Debug",
            "Microsoft.AspNetCore.Authorization": "Debug",
            "Microsoft.AspNetCore.Server": "Debug",
            "NHibernate": "None",
            "Itg.Persistence": "None",
            "Itg.Mailing": "Information"
          }
        }
      ]
    }

Nothing's changed.

Itg.Persistence entires are still missing. But on console I always get them: image And when the config file reloads I get them in the Persistence file: image

I also took a look at your repro, but I'm using something a bit more complicated. Pretty much this Queuing system by johnny 5. Here's how my implementation differs (because I'll have multiple work items):

protected override async Task ExecuteAsync(CancellationToken ct)
{
  this._logger.LogInformation("Queued Hosted Service is starting.");
  // ^ this is logged properly in the integra-<counter> file from the category "Itg.Services.Tasks" ^
  await this.BackgroundProceessing(ct);
}

private async Task BackgroundProceessing(CancellationToken ct)
{
  while (!ct.IsCancellationRequested)
  {
    var workOrder = await this._taskQueue.DequeueAsync(ct);

    using var scope = this._services.CreateScope();
    var workerType = workOrder
      .GetType()
      .GetInterfaces()
      .First(t => t.IsConstructedGenericType && t.GetGenericTypeDefinition() == typeof(IBackgroundWorkOrder<,>))
      .GetGenericArguments()
      .Last();

    // Getting a specific worker based on the generic type defined for the received Order.
    var worker = scope.ServiceProvider
      .GetRequiredService(workerType);

    var task = (Task) workerType
      .GetMethod("DoWork")
      .Invoke(worker, new object[] { workOrder, ct });
    await task;
  }
}

The lifetime of the related services:

...
services.AddScoped<IKeyValueStore<ArticleCollection>, FileKeyValueStore<ArticleCollection>>();
services.AddSingleton<IBackgroundTaskQueue, BackgroundTaskQueue>();
services.AddHostedService<QueuedHostedService>();

// Task implementations
services.AddScoped<CreateCollectionBackgroundWork.Worker>();
...

CreateCollectionBackgroundWork.Worker is where the FileKeyValueStore's Itg.Persistence.Secondary logs are coming from, and this is how that's done:

public static class CreateCollectionBackgroundWork
{
  public class Order : IBackgroundWorkOrder<Order, Worker>
  { ... /* contains parameters for it's worker*/ ... }

  public class Worker : IBackgroundWorker<Order, Worker>
  {
    public Worker(
      ILoggerFactory loggerFactory,
      ShopArticleService articleService,
      /* ^ logging works inside this; into integra-<counter> ^ */
      IKeyValueStore<ArticleCollection> collectionStore
      /* ^ the logger in this (FileKeyValueStore) is the one not logging to the file, but does to console, and logs also appear when the config file is reloaded ^ */)
    {
      this._logger = loggerFactory.CreateLogger(nameof(CreateCollectionBackgroundWork));
      // ^ this is actually still logging to integra-<counter> file ^
    }
  }
}

The logger in FileKeyValueStore is injected as you would normally:

// Maybe generic TModel messing with the category?
private readonly ILogger<FileKeyValueStore<TModel>> _logger;
public FileKeyValueStore(
  ...
  ILogger<FileKeyValueStore<TModel>> logger)
  {
    this._logger = logger;
  }

My ConfigureLogging call in Program class:

.ConfigureLogging((context, logging) =>
{
  logging.AddConfiguration(context.Configuration.GetSection("Logging"));
  logging.ClearProviders();
  logging.AddConsole();
#if DEBUG
  logging.AddDebug();
#endif
  logging.AddFile(o => o.RootPath = context.HostingEnvironment.ContentRootPath);
})

Thank you very much in advance! I do realize this is a lot, and at the same time too little. I do not need a fix or assistance ASAP. I can wait, and I can answer anything you'd like to know. But I can unfortunately only share snippets. If it comes to that maybe I could prepare a repro project for you, but I'm afraid you'd have to wait for too long... :/

Also, please remember, I'm not using current versions of anything (well except for C# 8)! Might seem odd, but I would first like to identify this problem, or at least pinpoint it before upgrading anything...

adams85 commented 4 years ago

I think the key momentum here is the blockage of the Itg.Services.* messages. However, I cannot see yet how they can get stuck for minutes. I suspect though it has something to do with the relatively high logging traffic of NHibernate.

Could you do a test where you fire a single(!) request and check the logs after some minutes if the Itg.Services.* entries are still missing? I'd just like to make sure that we're dealing with just one or two issues here (that is, whether there's also something regarding configuration reloading or not).

If the test failed (no Itg.Services.* entries after say 5 minutes), then please try the following:

  1. Repeat the test described above with unbounded queues (MaxQueueSize set to 0)
  2. Repeat the test where FileAccessMode set to KeepOpen and OpenTemporarily. Let's see if this makes a difference. (There's an issue with flushing file streams asynchronously in .NET. I ran into perf problems because of this but it's been solved - theoretically.)
  3. Update the file logger lib to the latest version (3.1.1). You can do this safely, I keep to semver (no breaking changes to the public APIs in minor versions). 3.1 comes with some perf-related improvements (e.g. Dataflow was replaced by Channels). Let's see if this changes anything.

This will take some time of yours, but I'm afraid we cannot do anything else. We have to narrow down the possible causes of your issue to be able to track it down.

adams85 commented 4 years ago

I updated my repro app according to the new details you supplied. It feeds 10k log entries into the logger and does some occasional logging in the background.

All entries land in the log as expected and I experienced no blocking. (KeepOpen vs. KeepOpenAndAutoFlush makes a significant difference though!)

bernatgy commented 4 years ago

Hi, and thank you again!

Interesting 1.: The non-persistence file seems to always work correctly, though I did not edit it's settings. Interesting 2.: I cannot set MaxQueueSize to 0. An exception gets thrown from filelogger on the first log event. I could only remove the setting alltogether (which should set it to 0, but this is confusing...). I can tell you more about it, but I'm guessing the result will be to upgrade to the new version either way, which would probably invalidate it, so let's wait it out. 😊

I am of course doing the same (single!) request in every case, with same parameters, so logs should be - and are - the same, and I've also been doing the waiting thing. 🤔 I wait on average 10~15 minutes for the logs, they just never come.

A bit of a confusion-correction: I want Itg.Persistence.Secondary logs in my Persistence file (not Itg.Services.*) :) At first, when the file reloaded, both landed in there, I'm assuming because of the "Default": "Trace" setting. But later on, I set "Default": "None", and now Itg.Services does not appear in the Persistence file, not initially and not on reload, which is what I want. (those ones go into the other log file, correctly)

Right now, the correct behaviour is happening, which is that NHibernate lines go into the Persistence file, and Itg.Persistence.Secondary lines go there as well, BUT only on reload (delay).

Note: Throughout this thing, I set FileKeyValueStore to be a singleton. (actually, your repro reminded me, that's more fitting) This had no effect on the observed behaviour.

Here are the cases you requested:

This feels so weird. Why the FileKeyValueStore...? Could it be that, it's also manipulating a different file? Could that specific thread be blocked by my file manipulations? And I mean blocked this hard? It's horrible code... But works... I can share some if you'd like...

I'm going to wait for some kind of a response, and am going to update to 3.1.1!!! Thank you very much for your continued support!

adams85 commented 4 years ago

I cannot set MaxQueueSize to 0. An exception gets thrown from filelogger on the first log event.

It's my bad. Using v3.0 you have to specify a negative number (like -1) to get unbounded queues (or just remove that setting altogether). v3.1 changed this: 0 is also treated as unbounded.

This feels so weird. Why the FileKeyValueStore...? Could it be that, it's also manipulating a different file? Could that specific thread be blocked by my file manipulations?

I think we're getting somewhere. OpenTemporarily opens, writes and closes the file for each log entry. (Needless to say, how this affects performance. It should only be used when low logging traffic is expected.) In contrast to this, KeepOpen and KeepOpenAndAutoFlush opens a stream to the log file once and uses that for writing all the incoming log entries. Writes are done on thread pool threads using async I/O. Except for flush (as I mentioned earlier) which needs to be a synchronous call for the moment. I thought there was something with this but you wrote that KeepOpen doesn't work either. That mode involves no flushing. The file stream is created with these settings. Speaking of such, what OS do you run your tests on?

I don't see yet how your code could interfere with the log writer but it seems this is the case. Is there any chance that your app opens the log files being written?

BTW, please upgrade to 3.1.1, repeat the tests and let's see if the issue persists. In the meantime, I'll also do a few tests by adding some file access simulation to the repro app.

adams85 commented 4 years ago

Just a thought flashed through my mind. Do you run some anti-virus software? If so, do a test with the AV software disabled. There's not much chance that it can cause interference but let's eliminate that for sure.

bernatgy commented 4 years ago

Hi! Again with the delay... 😢 A partial response for now:

Here's my system: image

But actually, I've had a multiple build update through an in-place reinstall since opening the issue, and it still did not change!

Also - while all this is still happening by chance which is crazy - at first I didn't have any antivirus! I had Windows Defender, and it just went dead and unnoticed. I couldn't start it because of some Windows Update corruption mess... That's why I then reinstalled Windows, and I'm now proudly using Malwarebytes. So, I've actually tried this with almost all possibilities in terms of an antivirus and OS version! 😂😂

If you're thinking what if all this madness played a part... My OS drive is not my work drive (which I have this project on). Nothing really happed on the work drive...

OpenTemporarily opens, writes and closes the file for each log entry. (Needless to say, how this affects performance. It should only be used when low logging traffic is expected.)

That's not good... 😢 I'll get back to you probably tomorrow after the upgrade!

bernatgy commented 4 years ago

Upgraded to 3.1.1 and did the tests again (with Malwarebytes and Defender turned off). Nothing seems to have changed. 🤔

OpenTemporarily still works. KeepOpen and KeepOpenAndAutoFlush still doesn't, but they spit the logs I want when the config reloads. (Checked with 10k MaxQueueSize and without one)

This is my whole FileKeyValueStore implementation (reminder, excuse the mess! 😂) I've commented on the lines I'd expect to see in my log.

About 2kb of binary serialized data into store file per request. It's 686kb at the moment. Saving takes about a second at this point. (judging by entries on console)

Now deleting the store file and doing it again. No difference.

adams85 commented 4 years ago

Sorry for the delay, I've just found time for continuing the investigation.

I created a much more sophisticated repro app. I used the code snippets you provided and did my best to fill in the gaps and reconstruct your context as close as possible.

This was the good news. The bad news is that I still cannot reproduce the problem... It works as expected. So please examine my new attempt and try to identify the differences between that and your real-life app.

After starting the repro app, you can simulate a request by visiting http://localhost:5000/Test. Please debug through the application and check for relevant differences. The entry point is MainController.Test. This action will enqueue a work item which is processed by the background worker (QueuedHostedService). From that point on the code will certainly be very familiar to you.

Let's see if this brings us a result.

bernatgy commented 4 years ago

Hi! I'm so sorry for the delay on my part as well.

So I've been trying to reproduce the issue in the repro app you created. I've gone ahead and made it closer and closer to my actual flow and implementation, but I've kind of given up. The only thing I did not modify yet is the setup in the Program class. Mine is a lot different because I'm not using the default builder. (Host.CreateDefaultBuilder) I'm doing it manually. Will check this out, but I think I've found a lot better lead just now.

I've removed the filelogger NuGet package reference, went ahead and added the source project as dependency! Currently I'm debugging through filelogger, and I've already identified the symptom.

The log file is successfully identified and everything actually goes smooth through FileLoggerProcessor.Enqueue(...) but then the FileLogEntry is just stuck in the Queue forever.

If I just let the debug through and perform another request, I can actually see that all the previous entries are still in the Queue. image

Then I saw that every exception is swallowed in FileLoggerProcessor.WriteEntryAsync which put my mental state in a weird place lol. Just dropped a breakpoint on the catch blocks and as foretold by great oracles of the past; there's always an exception...

IOException: The process cannot access the file 'D:\\...\\integra-persistence-000.log' because it is being used by another process. at FileLoggerProcessor:377

I've put up a static InstancesCount field which I'm incrementing in the constructor, and decrementing at the end of Dispose(), because I'm sure I don't have the file open in anything. Ran through debugging, there's two instances of FileLoggerProcessor when the exception is thrown.

At this point I don't really have a clue how that could happen. Logging is registered as Singleton for a good reason.

Do you think the following line in my QueuedHostedService (which I've shared here) could somehow create two completely different service providers?

using var scope = this._services.CreateScope();

Seems pretty strange... Docs say;

Creates a new IServiceScope that can be used to resolve scoped services.

Do you have an idea about where I could go deeper into this? I'm a complete newbie to debugging IO and Threading to be honest.

Also, pretty please... Could we have some kind of an indication about the exceptions? 😢😢😢😢 At least highlighted messages to the Debug out (while I realize this is utterly useless in production)? Or literally anything else? I know this is a really sensitive topic. What a logging solution should or should not do about / with exceptions is hard to determine but in my opinion, straight up swallowing all of them is not the best thing to do.

Again, I deeply appreciate your time and effort in this adventure! Thanks in advance!

adams85 commented 4 years ago

I've removed the filelogger NuGet package reference, went ahead and added the source project as dependency! Currently I'm debugging through filelogger, and I've already identified the symptom.

After failing to reproduce the issue, this would have been my last resort as well. I'm glad that you've already done that. 👍

Ran through debugging, there's two instances of FileLoggerProcessor when the exception is thrown.

This explains why OpenTemporarily works and KeepOpen* doesn't. In the latter case, both processors keep the log files open but only the "lucky" one which opens it sooner can actually write into it because of FileShare.Read. The "unlucky" one cannot write its queue to the file until the other one releases it, which explains why the missing log entries appear suddenly when you edit the settings (log files are always released on reloading of the settings).

Now the question is how on earth you end up with two FileLoggerProcessor instances. This is possible only if you have two instances of FileLoggerProvider as well. So I'm pretty sure that this is a DI configuration problem on your side.

Could we have some kind of an indication about the exceptions? 😢😢😢😢 At least highlighted messages to the Debug out (while I realize this is utterly useless in production)?

I'm aware of this issue but I haven't figured out yet how to solve it. The crux of it is that we want to do logging in a logger component. Calls to Debug.WriteLine are stripped out in release builds so that won't work. Trace.WriteLine may work though. But it might be even better to add some event by means of which one can observe the swallowed exceptions if necessary.

bernatgy commented 4 years ago

Hi!

I'm not using multiple DI containers. Your guess was correct though, I have 2 instances of FileLoggerProvider as well. I'm currently tracking down how the second one could be created... A few hours in, but no luck yet. The only thing I found out, was that using your snippet in Startup.Configure, both lines report only 1 instance.

At the same time, I have a breakpoint in the constructor of FileLoggerProcessor and I clearly see a second instance created by the DI, guessing from the call stack.

The first one is created by the AddFile(...) call in .ConfigureLogging(...). I could not yet identify where the second one comes from, but the funny thing is that, the ones found by your snippet are already these second instances.

(I put the instance index as a field onto them...) Will update once I figure something out. Thanks!

Update: I found it. I have an ILoggerFactory injected into my Startup class's constructor. That is where another instance is coming from. For some reason, it's not resolved to the existing instance created by AddFile(...) in Program. The reason for it, is that I had NHibernate setup there to use Microsoft logging.

I made the DI crash when requesting another FileLoggerProvider in AddFile(this ILoggingBuilder builder) by returning null for the second time. Then I could trace everything back to WebHost.cs:202 - _hostingServiceProvider.GetService<IStartup>();

Once I removed the faulty injection (and took out NHibernate logging with it...) No other instances are created.

Will update again, once I resolved this whole situation, but that'll have to come on another day again. I'm 99% percent sure this has to do with the fact that I'm still using the old .NET Core 2.x style WebHostBuilder setup instead of the new CreateDefaultBuilder / ConfigureWebHostDefaults and the relation between these and the internal IStartup interface.

At the time I made the conscious decision to stick with the old one, as I went through the source of the new methods and thought I could just pretty much copy / paste most of what they do. At least they seemed easy enough... but hey, seems like there's some hardcore internal reflection or something which won't allow this.

adams85 commented 4 years ago

I have an ILoggerFactory injected into my Startup class's constructor.

I don't think this is a good idea, at least, it's nonstandard. The framework allows injecting these services in Startup currently. (Not sure about older versions though.)

I'm 99% percent sure this has to do with the fact that I'm still using the old .NET Core 2.x style WebHostBuilder setup instead of the new CreateDefaultBuilder / ConfigureWebHostDefaults and the relation between these and the internal IStartup interface.

At the time I made the conscious decision to stick with the old one, as I went through the source of the new methods and thought I could just pretty much copy / paste most of what they do.

I suspect that during your customized host setup an intermediate DI container is built somewhere. When you inject ILoggerFactory in your Startup ctor, it gets resolved by this intermediate container, hence an unwanted IFileLoggerProvider instance gets created.

I recommend dropping the custom solution and migrating to the ASP.NET Core 3-style (generic host) setup. AFAIK, logging should be configured through ILoggingBuilder, configuring ILoggerFactory directly has been discouraged for a while now. However, if you absolutely need it, in theory, you can do something like this:

public static void Main(string[] args)
{
    var host = CreateHostBuilder(args).Build();
    ConfigureLoggerFactory(host.Services.GetRequiredService<ILoggerFactory>());
    host.Run();
}

private static void ConfigureLoggerFactory(ILoggerFactory loggerFactory)
{
    loggerFactory.AddProvider(...);
}