dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.84k stars 4.62k forks source link

FileSystemWatcher not getting rename events #32376

Open aggieben opened 4 years ago

aggieben commented 4 years ago

I've got a worker service in a .NET Core 3.1 app that uses a FileSystemWatcher to monitor PostgreSQL logfiles. However, it never receives events for the logfiles themselves. If I set the watcher to include subdirectories, it receives events for temporary files being updated, renamed, and deleted, but never for the log files (.log and .csv). I know from running ls -l over and over that the logfiles are in fact being updated as they constantly increase in size. I also used procmon to see that the .csv file is being written to about twice per second using WriteFile (in Windows).

Here's how I'm using my watcher:

type LogFileWatcherOptions() =
    member val DataDir = "" with get,set
    member val LogFormat = "" with get,set

type LogFileWatcher(logger:ILogger<LogFileWatcher>, optionsMonitor:IOptionsMonitor<LogFileWatcherOptions>) =
    inherit BackgroundService()

    let handleFileChanged (fsea:FileSystemEventArgs) =
        sprintf "%s changed: %O" fsea.FullPath fsea.ChangeType
        |> logger.LogTrace

    let handlePathCreated (fsea:FileSystemEventArgs) =
        sprintf "%s created: %O" fsea.FullPath fsea.ChangeType
        |> logger.LogTrace

    let handlePathRenamed (rea:RenamedEventArgs) =
        sprintf "%s renamed to %s (%O)" rea.OldFullPath rea.FullPath rea.ChangeType
        |> logger.LogTrace

    let handlePathDeleted (fsea:FileSystemEventArgs) =
        sprintf "%s deleted (%O)" fsea.FullPath fsea.ChangeType
        |> logger.LogTrace

    let configureWatcher path format =
        sprintf "configuring watcher with path: '%s' and format: '%s'" path format
        |> logger.LogTrace

        if not (Directory.Exists(path)) then
            failwithf "data directory does not exist: %s" path

        let watcher = new FileSystemWatcher()
        watcher.Path <- path
        watcher.IncludeSubdirectories <- true
        watcher.Filter <- "*.csv"
        watcher.NotifyFilter <- watcher.NotifyFilter ||| NotifyFilters.LastAccess
        watcher.NotifyFilter <- watcher.NotifyFilter ||| NotifyFilters.Size

        watcher.Changed.Add(handleFileChanged)
        watcher.Created.Add(handlePathCreated)
        watcher.Renamed.Add(handlePathRenamed)
        watcher.Deleted.Add(handlePathDeleted)
        watcher

    let _options = optionsMonitor.CurrentValue
    let _watcher = configureWatcher _options.DataDir _options.LogFormat

    override __.Dispose() =
        _watcher.Dispose()
        base.Dispose()

    override __.ExecuteAsync(stopToken) =
        async {
            logger.LogTrace "enabling filewatcher events..."
            _watcher.EnableRaisingEvents <- true
            while not stopToken.IsCancellationRequested do
                do! Async.Sleep 500
            logger.LogTrace "cancelling..."
            _watcher.EnableRaisingEvents <- false
        }
        |> Async.StartAsTask
        :> Task

Is it possible that PostgreSQL is writing to these logs in a way that the underlying implementation of FileSystemWatcher is unable to detect?

Update: I tested this on Windows, too, and the same thing holds true there.

rdnlsmith commented 4 years ago

I may be experiencing the same thing. I'm attempting to watch /etc/cups/printers.conf in order to do things when printers are added or removed from CUPS. FileSystemWatcher gets the notification if I run touch /etc/cups/printers.conf, but it does not react when CUPS writes the file following an lpadmin command. I also confirmed via ls -l that the timestamp changed.

rdnlsmith commented 4 years ago

I determined that CUPS writes its changes to a temporary file and then renames it over the original printers.conf. I wasn't previously watching for renames.

info: zFileWatcher.Worker[0]
      Changed: printers.conf.N
info: zFileWatcher.Worker[0]
      Renamed: printers.conf -> printers.conf.O
info: zFileWatcher.Worker[0]
      Renamed: printers.conf.N -> printers.conf

@aggieben I see in your sample code that you have a handler for the Rename event, but it doesn't look like you're including NotifyFilters.FileName in your watcher.NotifyFilter. In my test app, the Rename event never fired unless I had this option.

danmoseley commented 4 years ago

Thanks @rdnlsmith. I was having trouble figuring out how this could be an issue for both Windows and Unix. @aggieben sounds like we can close?

aggieben commented 4 years ago

@danmosemsft no, this shouldn't be closed. The default value for FileSystemWatcher.NotifyFilters is LastWrite | FileName | DirectoryName , so in my sample the effective filter is LastWrite | FileName | DirectoryName | LastAccess | Size. Also, if I don't set the .Filter property, the watcher gets events for all kinds of data being written to temp files and renamed within the PostgreSQL data directory.

aggieben commented 4 years ago

Also, to see the full code feel free to clone this branch: https://github.com/aggieben/PoQMon/tree/feature/1-filesystemwatcher

danmoseley commented 4 years ago

@aggieben can you clarify -- you're saying that with LastWrite | FileName | DirectoryName | LastAccess | Size you're not getting notifications renames? And this is the case on both Windows and some other OS?

cc @JeremyKuhne who owns this.

aggieben commented 4 years ago

@danmosemsft yes, that's correct. On both Windows and macOS my code receives no notifications at all for the .csv and .log log files that PostgreSQL writes.

danmoseley commented 4 years ago

Ok then @jeremykuhne should comment

JeremyKuhne commented 4 years ago

cc: @carlossanlop

A few initial comments.

(1) Last access time isn't turned on by default as it is a pretty big performance hit. If your other timestamps aren't getting updated you won't see any updates.

C:\Program Files (x86)\Microsoft Visual Studio\Installer>fsutil behavior query disableLastAccess
DisableLastAccess = 3  (System Managed, Enabled)

(2) Updates aren't sent until the file system flushes it's caches, at least on Windows. I presume it is similar on Unix, although I'd need to dig into the APIs there to verify.

(3) Your filter will let only "*.csv" files through. "*.log" files will get filtered out.

aggieben commented 4 years ago

@JeremyKuhne yep, I'm aware on (3). I just raised the point about both files because it wasn't particular to the csv file.

I'm not sure about (1), and honestly I don't want to need to care about that one. I just turned on that filter to cover all my bases while testing this.

On (2): I can use other tools to read the contents of the file and see the updates; wouldn't that suggest that the filesystem is flushing, at least periodically? What can I do to help investigate?

JeremyKuhne commented 4 years ago

wouldn't that suggest that the filesystem is flushing, at least periodically

It will flush, the question is when. I'm also not sure what the exact behavior is for updates to the attributes vs. other blocks in the file so I can't say for sure that observing contents will reflect attribute updates.

The other thing that can happen is that if a lot of changes are happening you will lose change notifications. If there is more than will fit into the buffer Windows will toss all events. Given that the file name filtering is being done by FileSystemWatcher, this might be happening. You can add an Error handler to see if this is happening- it should fire as described in the documentation. The buffer size can be increased, it defaults to 8K, which can be eaten in a hurry with lots of events with long file names.

Outside of the above if you can simplify the repro any further it would help.

I could be missing something, but first guess is it seems like you might be overwhelming the watcher. If possible, from a performance perspective, it would be good to isolate the files you don't care about from the ones you do, even if you can manage to capture everything by increasing the buffer size.

aggieben commented 4 years ago

I just noticed something that might speak to the flushing issue: if I only have my watcher program running, it doesn't receive events. But if I do ls in another console in the directory under watch, it does receive the events. That suggests to me that flushing only happens on access.

2020-02-19_14-49-16

aggieben commented 4 years ago

I also tried adding an error handler as you suggested, but it doesn't receive any events.

aggieben commented 4 years ago

The more I test this the more convinced I am that this is just a flushing issue. Would it make sense to make "auto-flushing" a thing that the FileSystemWatcher class can do for me? I might be able open a PR for that if that makes sense, @JeremyKuhne

JeremyKuhne commented 4 years ago

Would it make sense to make "auto-flushing" a thing

I don't know how you would make that happen from the FileSystemWatcher side of things. FlushFileBuffers is the only pubic API I know of and you have to have a handle to the file you want to flush or the volume handle and admin access.

It would be worth checking what impact setting FileOptions.WriteThrough on the files you're tracking has (if at all possible). I presume that would make things work as that will flush the metadata to disk immediately (and not just the data).

Creating a polling mode for the FileSystemWatcher is on our bucket list. If the watcher was periodically checking attributes it would likely avoid the situation you are in. It would be another advantage to consider. Some other other reasons we want a polling watcher option:

  1. The number of open file handles on Unix are extremely limited (by default) compared to Windows
  2. There is significant risk to losing change events for active watched directories

Some disadvantages of polling:

  1. You have to walk all entries to look for changes (although we do have much faster/performant enumeration APIs than we used to)
  2. You have no context for renames
carlossanlop commented 4 years ago

@aggieben going back to the first comment by @JeremyKuhne:

(1) Last access time isn't turned on by default as it is a pretty big performance hit. If your other timestamps aren't getting updated you won't see any updates. C:\Program Files (x86)\Microsoft Visual Studio\Installer> fsutil behavior query disableLastAccess DisableLastAccess = 3 (System Managed, Enabled)

I see you're doing the correct thing by adding the notify filter in your code:

watcher.NotifyFilter <- watcher.NotifyFilter ||| NotifyFilters.LastAccess

But you also said:

I'm not sure about (1), and honestly I don't want to need to care about that one. I just turned on that filter to cover all my bases while testing this.

You do care about it. LastAccess will not work as long as you don't execute the fsutil command to enable monitoring that field on Windows.

aggieben commented 4 years ago

@carlossanlop the reason I said I don't want to care about LastAccess is that I only care when the file has been written. I only added the filter in an attempt to show that no events were firing at all when initially reporting the issue.

wfurt commented 4 years ago

If multiple files are being changed at once (or quicky) you may be running into https://github.com/dotnet/runtime/issues/30846. Actual events are cached and distributed by separate OS daemon as well as the logic to pair events was broken at .Net layer. There is at least one more outstanding bug where available OS API does not allow to disambiguate events properly in all cases.

It may be work or putting together standalone repro @aggieben where you watch as well as manipulate watched files. And perhaps give it try with 5.0 preview.

carlossanlop commented 4 years ago

@Jozkee and I tested the code reported by @RobPaulson in #36282 using 5.0 (master) both in Windows and Linux (WSL), and the issue seems to be fixed - the rename call is being reported as expected (thanks @wfurt for linking the related fix). @aggieben, can you test your code in 5.0 to verify if it's fixed? @RobPaulson, @aggieben is it possible for both of you to upgrade your projects to the latest 5.0 Preview?

trivalik commented 3 years ago

I have this rename issue on windows too, just rename manually one file, in a pretty small directory with 5 files. Any workaround? I just have a timer which checks certain conditions to work arround this.

danmoseley commented 3 years ago

@trivalik what version of .NET are you using?

trivalik commented 3 years ago

.NET 5 installed by VS version 16.9.5. I guess watching NotifyFilters.FileName should be enough, I have also included NotifyFilters.LastAccess but earlier is mentions that this need extra activation in system. I watched a folder with default Filter *. Then I renamed a exe file.