cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.76k stars 441 forks source link

pebble: race between MemFS.Close and MemFS.Remove #1236

Open bananabrick opened 3 years ago

bananabrick commented 3 years ago

The tableCache on master has a release loop which closes files, https://github.com/cockroachdb/pebble/blob/master/table_cache.go#L173. But it doesn't seem like we wait for a file to close before removing it.

Jira issue: PEBBLE-220

petermattis commented 3 years ago

Where is MemFS.Remove being called on a still open sstable? That is definitely a bug if it is occurring, but I'm not seeing the path that causes it from your description. doDeleteObsoleteFiles() calls tableCache.evict() which calls tableCacheShard.evict() which synchronously calls tableCacheValue.release() which calls sstable.Reader.Close() which calls File.Close(). There is a comment in tableCacheShard.evict() that the close needs to happen synchronously:

        // NB: This is equivalent to tableCacheShard.releaseNode(), but we perform
        // the tableCacheNode.release() call synchronously below to ensure the
        // sstable file descriptor is closed before returning. Note that
        // tableCacheShard.releasing needs to be incremented while holding
        // tableCacheShard.mu in order to avoid a race with Close()

Moreover, it seems as if we don't really do anything with the error returned from d.opts.Cleaner.Clean in d.deleteObsoleteFile. - The error gets logged here: https://github.com/cockroachdb/pebble/blob/master/compaction.go#L2872, but we don't do anything to rectify the error.

Yeah, there is a TODO about this:

    // TODO(peter): need to handle this error, probably by re-adding the
    // file that couldn't be deleted to one of the obsolete slices map.
    err := d.opts.Cleaner.Clean(d.opts.FS, fileType, path)
    if oserror.IsNotExist(err) {
        return
    }

The suggestion in the TODO seems reasonable. The thinking on why this wasn't urgently addressed is that on startup we'll capture any obsolete file that wasn't previously deleted. Still, it would be worthwhile to address the TODO.

bananabrick commented 3 years ago

@petermattis

In tableCacheShard.evict, we only call v.release if v != nil. Unfortunately, in the code path where we call tableCacheShard.addNode -> tableCacheShard.evictNodes -> tableCacheShard.runHandCold to tableCacheShard.clearNode -> tableCacheShard.unrefValue, we end up adding the value to the releasingCh, and setting the value to nil.

When the file is removed later through a call to doDeleteObsoleteFiles, the releaseLoop hasn't necessarily closed the file yet.

The reason the evict function call doesn't end up calling release on the value is because it has already been set to nil.

petermattis commented 3 years ago

@bananabrick Ah, that's a great find. I'm not sure what to do here. We may need a secondary map of evicted sstables that we're waiting to close. Even worse, if you evict and re-open and the evict, there may be multiple file handles opened for the same sstable simultaneously. Not really a problem except that we need to wait for all of them to close. Note that the MemFS behavior is meant to emulate the Windows behavior. On Unix it isn't a problem to delete a file that is still open.

bananabrick commented 3 years ago

It seems like windows support isn't a priority, but we do have some tests which seem to rely on us closing files. I've run into similar issues while trying to write more tests for the shared table cache.

I can look into this more carefully next week.

petermattis commented 3 years ago

Windows support isn't a huge priority, but Pebble does make an attempt at it. This bug is currently the only one I'm aware of with regards to Windows.

github-actions[bot] commented 7 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to Pebble!