Replaced synchronized with ReentrantReadWriteLock

Wavesonics commented 10 years ago

This removes all synchronized statements in favor of a ReentrantReadWriteLock. This allows read operations to not block each other, but write operations to block everything (the way synchronized was working previously).

With the addition of a simple contains() method, it is now super cheap to check if an image exists in the cache.

Added unit tests for the new contains() method.

Wavesonics commented 10 years ago

This PR replaces #56 as it now lives in it's own branch and contains unit tests.

And still of course addresses issue #54

swankjesse commented 10 years ago

I'm not convinced that we want to go this direction. Right now readers are not blocking other readers in any meaningful way. They acquire the synchronized lock, do a quick read, and return immediately.

What you really want is a readonly snapshot of the data that readers can read while the writer flushes its changes to disk.

Wavesonics commented 10 years ago

@swankjesse currently readers do block other readers. Not the actual reading of the InputStream, but just accessing the cache in any way.

In the case where you just want to check if something is in the cache, and if so spin off a background thread to read it, it can be very expensive to wait for a lock, just to check for the existence of an item. This is of course if you have a high level of concurrency where many actors are checking if their own item exists in the cache. I was seeing as high as 70ms for a single get() to resolve because it was waiting on others to do the same. And we're not talking an insane number of concurrent reads, probably around 10 or 15, but fired off very quickly.

Your idea of immutable snapshots would not solve this problem.

If the urge is to remove locks all together I can understand that, but as it stands, this is functionally the same as what exists now, but is much more efficient in some cases, while sacrificing nothing. So I don't see the drawback to accepting this change.

swankjesse commented 10 years ago

I am highly skeptical that your thread waited 70ms to enter a synchronized block that was only being accessed by readers. Are you sure it wasn't a writer?

swankjesse commented 10 years ago

... Actually, that's plausible because get() does I/O. What you need is a contains() method that is synchronized.

Wavesonics commented 10 years ago

There were definitely no writers. A synchronized contains() method would only help the first actors: As the first of the readers confirmed their item was indeed in the cache, they would proceed to call get() (on another thread), that get call would have the slow I/O in it, and other calls to contains() would then hit the same wait penalties.

On Tue, Feb 11, 2014 at 4:58 PM, Jesse Wilson notifications@github.comwrote:

... Actually, that's plausible because get() does I/O. What you need is a contains() method that is synchronized.

Reply to this email directly or view it on GitHubhttps://github.com/JakeWharton/DiskLruCache/pull/59#issuecomment-34827652 .

swankjesse commented 10 years ago

@Wavesonics got it. I'm still reluctant to introduce read/write locks. I think the right fix for this project is to avoid locks, not to improve their granularity. Most applications using a disk cache are going to suffer concurrent multiple writers.

JakeWharton / DiskLruCache

Replaced synchronized with ReentrantReadWriteLock #59