gmethvin / directory-watcher

A cross-platform Java recursive directory watcher, with a JNA macOS watcher and Scala better-files integration
Apache License 2.0
264 stars 34 forks source link

Modification events are lost depending on editor - how the file is written - neovim #88

Open davesann opened 1 year ago

davesann commented 1 year ago

I came a situation where notifications are lost (I think because here: https://github.com/gmethvin/directory-watcher/blob/main/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java#L347) the has is not seen as new.

This appears to arise from the manner in which an editor writes the files. I was using neovim.

I don't know if you will fix this - but the issue may be useful to others

neovim has a couple of ways that it writes when handling backups - the filewatcher will fail if backupcopy=no (or "auto" because then it is sometimes "no")

from neovim help

'backupcopy' 'bkc' string (default: "auto") global or local to buffer |global-local| When writing a file and a backup is made, this option tells how it's done. This is a comma-separated list of words.

The main values are: "yes" make a copy of the file and overwrite the original one "no" rename the file and write a new one "auto" one of the previous, what works best

I also saw this swallowing of events also with xed - probably for similar-ish reasons. And not with nano.

There are exceptions in the stack trace for these file changes that might help diagnose:

This is with neovim

Failing case; backupcopy=no; Final ENTRY_MODIFY is swallowed

2023-02-01 18:27:18 DEBUG DirectoryWatcher:301 - ENTRY_DELETE [/home/.../clerk-demo/notebooks/data_science.clj]
2023-02-01 18:27:18 DEBUG DirectoryWatcher:372 - DirectoryWatcher got an exception while watching!
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /home/.../clerk-demo/notebooks/data_science.clj/?
        at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:121)
        at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:68)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:278)
        at java.base/java.nio.file.Path.of(Path.java:147)
        at java.base/java.nio.file.Paths.get(Paths.java:69)
        at io.methvin.watcher.PathUtils.subMap(PathUtils.java:43)
        at io.methvin.watcher.DirectoryWatcher.runEventLoop(DirectoryWatcher.java:362)
        at io.methvin.watcher.DirectoryWatcher.lambda$watchAsync$1(DirectoryWatcher.java:232)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
2023-02-01 18:27:18 DEBUG DirectoryWatcher:301 - ENTRY_CREATE [/home/.../clerk-demo/notebooks/data_science.clj]
2023-02-01 18:27:18 DEBUG DirectoryWatcher:474 - Skipping create event for path [/home/.../data_science.clj]. Path already hashed.
2023-02-01 18:27:18 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../data_science.clj]

Successful case; backupcopy=yes; ENTRY_MODIFY is sent as MODIFY event.

2023-02-01 18:30:19 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../data_science.clj]
2023-02-01 18:30:19 DEBUG DirectoryWatcher:402 - -> MODIFY [/home/.../clerk-demo/notebooks/data_science.clj] (isDirectory: false)
gmethvin commented 1 year ago

I think the main issue is the InvalidPathException. That prevents us from being able to notify for the ENTRY_DELETE and remove the hashes from the hash map. Then when we get an ENTRY_CREATE event, we ignore it because we see it as a "duplicate" create event since the file already exists.

The exception happens when we do:

pathMap.subMap(treeRoot, Paths.get(treeRoot.toString(), "" + Character.MAX_VALUE));

The Character.MAX_VALUE is used as a hack there to get us all the paths in the subdirectory that start with that particular path. This code basically assumes your system charset is UTF-8, so that character should be representable. Is it possible you're using some other charset? I think you should be able to force UTF-8 everywhere by setting the environment variables like:

LANG=C.UTF-8
LC_ALL=C.UTF-8

In any case, I'm not sure we should be relying on the subMap hack above, so perhaps an alternative solution makes sense there anyway.

davesann commented 1 year ago

LANG=en_GB.utf8 LC_ALL=

I tried setting LANG and LC_ALL to C.UTF-8 (exported in the shell before starting the JVM) but this did not change the behaviour.

On my system Character.MAX_VALUE printed in my repl as \<00ffff>

openjdk 19.0.2 2023-01-17

(I had no idea about LANG and LC_ALL - that looks like a deep rabbit hole)

On Wed, 1 Feb 2023 at 20:55, Greg Methvin @.***> wrote:

I think the main issue is the InvalidPathException. That prevents us from being able to notify for the ENTRY_DELETE and remove the hashes from the hash map. Then when we get an ENTRY_CREATE event, we ignore it because we see it as a "duplicate" create event since the file already exists.

The exception happens when we do:

pathMap.subMap(treeRoot, Paths.get(treeRoot.toString(), "" + Character.MAX_VALUE));

The Character.MAX_VALUE is used as a hack there to get us all the paths in the subdirectory that start with that particular path. This code basically assumes your system charset is UTF-8, so that character should be representable. Is it possible you're using some other charset? I think you should be able to force UTF-8 everywhere by setting the environment variables like:

LANG=C.UTF-8LC_ALL=C.UTF-8

In any case, I'm not sure we should be relying on the subMap hack above, so perhaps an alternative solution makes sense there anyway.

— Reply to this email directly, view it on GitHub https://github.com/gmethvin/directory-watcher/issues/88#issuecomment-1411771443, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFPR4XI3Z7O6KJBKOZZZCTWVIXHZANCNFSM6AAAAAAUNL3QYM . You are receiving this because you authored the thread.Message ID: @.***>

gmethvin commented 1 year ago

ok, it's probably not worth trying to debug why that specific hack doesn't work. It's easy enough to switch to another method of getting the paths in a subtree.

gmethvin commented 1 year ago

@davesann try version 0.17.3 and see if that fixes your bug (note I accidentally released 0.17.2 without the fix).

I believe you should get a DELETE followed by a CREATE in the backupcopy=no case.

davesann commented 1 year ago

Using nvim

Yes. I think this has fixed the issue.

with backupcopy=yes; (copy and modify in place); still works as before; I get:

2023-02-03 09:14:43 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../clerk-demo/notebooks/data_science.clj]
2023-02-03 09:14:43 DEBUG DirectoryWatcher:400 - -> MODIFY [/home/.../clerk-demo/notebooks/data_science.clj] (isDirectory: false)

with backupcopy=no; (rename and create new); works, I get:


2023-02-03 09:16:24 DEBUG DirectoryWatcher:301 - ENTRY_DELETE [/home/.../clerk-demo/notebooks/data_science.clj]
2023-02-03 09:16:24 DEBUG DirectoryWatcher:400 - -> DELETE [/home/.../clerk-demo/notebooks/data_science.clj] (isDirectory: false)
2023-02-03 09:16:24 DEBUG DirectoryWatcher:301 - ENTRY_CREATE [/home/.../clerk-demo/notebooks/data_science.clj]
2023-02-03 09:16:24 DEBUG DirectoryWatcher:400 - -> CREATE [/home/.../clerk-demo/notebooks/data_science.clj] (isDirectory: false)
2023-02-03 09:16:25 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../clerk-demo/notebooks/data_science.clj]

The final entry_modify is not converted to a Modify event - which I think is expected

using xed

This was just out of curiosity (I don't use xed but tried it when first looking at the issue) - no exceptions.

Xed seems to use a tmp file - but this does not appear to generate the final create or modify event for the modified file - so this change does not get notified.


2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_CREATE [/home/.../clerk-demo/notebooks/.goutputstream-FH1TZ1]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:400 - -> CREATE [/home/.../clerk-demo/notebooks/.goutputstream-FH1TZ1] (isDirectory: false)
2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../notebooks/.goutputstream-FH1TZ1]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../notebooks/.goutputstream-FH1TZ1]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:400 - -> MODIFY [/home/.../clerk-demo/notebooks/.goutputstream-FH1TZ1] (isDirectory: false)
2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_MODIFY [/home/.../clerk-demo/notebooks/.goutputstream-FH1TZ1]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_DELETE [/home/.../clerk-demo/notebooks/.goutputstream-FH1TZ1]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:400 - -> DELETE [/home/dave/.../clerk-demo/notebooks/.goutputstream-FH1TZ1] (isDirectory: false)
2023-02-03 09:23:24 DEBUG DirectoryWatcher:301 - ENTRY_CREATE [/home/..../clerk-demo/notebooks/data_science.clj]
2023-02-03 09:23:24 DEBUG DirectoryWatcher:472 - Skipping create event for path [/home/.../clerk-demo/notebooks/data_science.clj]. Path already hashed.
gmethvin commented 1 year ago

Interesting... so with xed it appears from the logs that we get an ENTRY_CREATE for the data_science.clj file, but there never was a prior ENTRY_DELETE event for that same file. That suggests that either the file was actually deleted but somehow the ENTRY_DELETE was never received, or that the watch service is reporting an ENTRY_CREATE when there is already an entry for that file.

If it's the latter, then the DirectoryWatcher logic probably needs to be updated to account for that case. It seems odd that we would get an ENTRY_CREATE if the entry already exists, but perhaps some WatchService implementations do that when a file is moved and replaces another entry.

I suppose it might not hurt to just report a CREATE event regardless. One of the goals of directory-watcher's hashing is to prevent duplicate events, but typically those come in the form of a series of ENTRY_MODIFY events rather than duplicate ENTRY_CREATE events.

gmethvin commented 1 year ago

So after looking at this a bit more, I realize there's a good reason directory-watcher suppresses the duplicate CREATE events.

Basically, when a new directory is created, there several situations in which we don't get ENTRY_CREATE events from subdirectories that already existed, so we may need to traverse the directory structure to generate our own CREATE events. But it's possible new files are created before that traversal is complete, and we do get ENTRY_CREATE events for those later, thus leading to duplicates.

That said, while it's a bad idea to send a duplicate CREATE in that case, it might be reasonable to send a MODIFY event if the hash of the file is actually different when we get the duplicate ENTRY_CREATE event. I believe that would also solve the case with xed where the ENTRY_CREATE is being sent when the file is actually being modified.