gmethvin / directory-watcher

A cross-platform Java recursive directory watcher, with a JNA macOS watcher and Scala better-files integration
Apache License 2.0
265 stars 34 forks source link

Create/modify events not emitted if file is locked by another process #13

Closed chromy96 closed 6 years ago

chromy96 commented 6 years ago

DirectoryWatcher is trying to calculate the hash of every created/modified file when the event is emitted. Because of it, if the file is locked by the process writting to it, it can happen that some create/modify events are not emitted.

The error happens in PathUtils.hash(Path file) method. If you enable logging in catch clause, you will see that sometimes hash calculation fails with the following exception:

[ForkJoinPool.commonPool-worker-9] ERROR io.methvin.watcher.PathUtils - Exception when calculating hash java.io.IOException: The process cannot access the file because it is being used by another process at java.base/java.io.FileInputStream.readBytes(Native Method) at java.base/java.io.FileInputStream.read(FileInputStream.java:258) at com.google.common.io.ByteStreams.copy(ByteStreams.java:106) at com.google.common.io.ByteSource.copyTo(ByteSource.java:246) at com.google.common.io.ByteSource.hash(ByteSource.java:326) at io.methvin.watcher.PathUtils.hash(PathUtils.java:49) at io.methvin.watcher.DirectoryWatcher.notifyCreateEvent(DirectoryWatcher.java:257) at io.methvin.watcher.DirectoryWatcher.watch(DirectoryWatcher.java:162) at io.methvin.watcher.DirectoryWatcher.lambda$watchAsync$0(DirectoryWatcher.java:100) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1603) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

My question is: Is it necessary to calculate the hash of the files at all? Would it be enough to keep only directories in hash map and emit all the events for the files as they happen?

Here's the gist with the test case that reproduces the issue for create event: https://gist.github.com/chromy96/def29f4e9a0430e90f9ae28434fcc0f7

gmethvin commented 6 years ago

Calculating the hash of files is not strictly required, but it is useful to prevent unnecessary duplicate events from being emitted. Perhaps we can provide an option to disable this feature for regular files? Hashing also effects performance somewhat so some people may want to disable hashing for other reasons.

chromy96 commented 6 years ago

Option for disabling hashing files woud be a great addition. I can provide the first implementation if you're busy right now.

gmethvin commented 6 years ago

A pull request would be great, or else I'll probably look at implementing it sometime the next month or so.

gmethvin commented 6 years ago

Fixed by #14