JakeWharton / DiskLruCache

Java implementation of a Disk-based LRU cache which specifically targets Android compatibility.
http://jakewharton.github.io/DiskLruCache
Apache License 2.0
5.79k stars 1.18k forks source link

Journal READ Corruption Issue #52

Open doapp-nick opened 10 years ago

doapp-nick commented 10 years ago

I'm seeing unpredictable but consistent corruption of journal files.  The point of corruption is always a READ line in the journal.  I'm trying to wrap my head around how this could happen and wondering if anyone has suggestions.

I've created an object cache that wraps the DiskLruCache.  Here are my put and get methods:

public Object get(String key) throws IOException, ClassNotFoundException {
        Object result = null;
        String sanitizedKey = getDiskHashKey(key);
        if(cache != null){
            DiskLruCache.Snapshot snapshot = cache.get(sanitizedKey);
            if (snapshot != null) {
                InputStream inputStream = snapshot.getInputStream(0);
                InputStream buffer = new BufferedInputStream(inputStream);
                ObjectInput input = new ObjectInputStream(buffer);
                result = input.readObject();
                snapshot.close();
            }
        }
        return result;
    }

public void put(String key, Object value) throws IOException {
        String sanitizedKey = getDiskHashKey(key);
        DiskLruCache.Editor creator = cache.edit(sanitizedKey);
        if(creator != null){
            try{
                ObjectOutputStream objOutStream = new ObjectOutputStream(new BufferedOutputStream(creator.newOutputStream(0)));
                objOutStream.writeObject(value);
                objOutStream.close();
                creator.commit();
            }catch(Exception e){
                e.printStackTrace();
                creator.abort();
            }
        }
    }

My caches are on the order of 3,000 objects when i start seeing the behavior.  This object cache is accessed by multiple threads but it looks to me like the DiskLruCache itself is doing the necessary synchronization.  I've seen the cache get wiped out several times, and I've managed to capture a cache in the state where it has an invalid entry:

READ 2013-reg-c8d10fe6-8c07-4b33-8d03-b7addf69038d READ 2013-reg-191ff930-47e3-48b0-92f2-1da067e982d4 READ 2013-reg-b6ff7fd9-7da5-4f71-946f-e5d560060794 RREAD 2013-pst-72fe30af-b568-48cd-8582-bfbaa0e2ae6b READ 2013-pst-5440007e-4f48-46c0-bec2-3ab4779a4c3e READ 2013-pst-0117fbeb-a140-43ba-8dd2-4b7fc0f55211

In every case where the cache has become corrupted, the behavior is similar.  Part of a READ entry is written over with another READ. This isn't at the very end of the cache state, its about 85% of the way through the cache.

swankjesse commented 10 years ago

Which OS and filesystem? I'd like to try to reproduce in the same environment!

doapp-nick commented 10 years ago

OSX 10.9.1 - Mac OS Extended (Journaled)

java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

doapp-nick commented 10 years ago

Just hit it again after I had to kill the java process.

READ 2013-reg-313-fa2d6b37-b7d5-4d5c-b3READ 2013-reg-338-50668ff9-c152-4ddb-888d-236f4321f147

I am having to 'restart' (basically kill the jvm) the server pretty regularly during development. Could that be contributing to this? Maybe something with the BufferedWriters and the termination sequence?

swankjesse commented 10 years ago

In theory the cache should be able to recover from an incomplete journal.

  @Test public void openWithTruncatedLineDiscardsThatLine() throws Exception {
    cache.close();
    writeFile(getCleanFile("k1", 0), "A");
    writeFile(getCleanFile("k1", 1), "B");
    Writer writer = new FileWriter(journalFile);
    writer.write(MAGIC + "\n" + VERSION_1 + "\n100\n2\n\nCLEAN k1 1 1"); // no trailing newline
    writer.close();
    cache = DiskLruCache.open(cacheDir, appVersion, 2, Integer.MAX_VALUE);
    assertThat(cache.get("k1")).isNull();
  }

I suspect the problem is that we recover from the incomplete journal, but we don't delete the truncated line before starting to write new lines into the file?

doapp-nick commented 10 years ago

Yeah, as it stands, it looks like any time an incomplete or junk line exists, the entire cache gets destroyed. I suppose this is because a cache of unknown state is not super useful, but in my use case, im not sure how to avoid this scenario.

rupertbates commented 9 years ago

I'm seeing a variation on this issue which is happening on Android. The behaviour is the same in that there is a corrupted line in the journal and that is causing the cache to be deleted on opening. In my case the line looks like this:

READ 96644ef5b41c75bd081e1737bREAD e9211ead93e7e0c88e4f79b0d5af4a45 Ie. it has 2 keys on the line.

This causes the method readJournalLine() to throw when it encounters it at which point the whole cache gets deleted. It would be much more use to me to just drop the invalid line rather than deleting the whole cache (particularly if it is just a read that is corrupted), can anyone see any reason why this wouldn't work?

mikeklein commented 4 years ago

Late to the game here but am seeing same issue.

My android application will at some point delete my entire cache and start from scratch! This particular cache has no methods exposed to remove entries so they should always be there. I appear to be hitting this "corrupt journal" issue and my cache is getting nuked.

I'm unclear on what is causing this state or how to fix it.

mikeklein commented 4 years ago

I finally caught issue when occuring in debugger. What's very odd is if I pause code before cache.delete and use debugger to issue call to rebuild the journal and then restart...my cache then comes up without issues yet there's a "bad" line left in journal. It's unclear why the journal is set to rebuild itself based on directory contents...or why nuking whole cache is preferable to merely missing an entry.

I added a line of code to flush cache after my writes and this isn't solving the issue. I the corruption occurred why my app was idle and I killed the task (do this regularly during debugging of course).

On first seeing corrupt journal and opening file I see: READ popegregoryiii READ popezachary READ popeelectstephen READ popestepheREAD curricula READ authors READ blackpower READ conquerors READ foundingfathers READ greatwomen

After rebuild of journal (and success bringing up cache) I see following corruption still in journal (bad line moved to end of journal):

CLEAN popegregoryiii 135 CLEAN popezachary 127 CLEAN popeelectstephen 139 CLEAN popestepheREAD 0

Would appreciate any ideas. Like others I have multiple directories (for each cache) and there may be a few thousand files in each directory.

Jeanno commented 1 year ago

Looks like this is a rather old problem and it still persists today. After taking a look at the code I might have some idea on the root cause of this problem.

Assuming you don't have multiple instances of the DiskLRU on the same dir (across processes / threads), there's still a chance where the journal file can get corrupted.

This is because the operations on the primary journal file are not atomic. For example this line operates directly on the primary journal file.

journalWriter.append(READ + ' ' + key + '\n');

If the process crashes or gets killed when journalWriter is writing to the file, which is not uncommon, the journal file will get corrupted immediately. And therefore, when DiskLRU detects there's a corruption in the journal, it removes the entire cache.