Open bananabrick opened 3 years ago
Where is MemFS.Remove
being called on a still open sstable? That is definitely a bug if it is occurring, but I'm not seeing the path that causes it from your description. doDeleteObsoleteFiles()
calls tableCache.evict()
which calls tableCacheShard.evict()
which synchronously calls tableCacheValue.release()
which calls sstable.Reader.Close()
which calls File.Close()
. There is a comment in tableCacheShard.evict()
that the close needs to happen synchronously:
// NB: This is equivalent to tableCacheShard.releaseNode(), but we perform
// the tableCacheNode.release() call synchronously below to ensure the
// sstable file descriptor is closed before returning. Note that
// tableCacheShard.releasing needs to be incremented while holding
// tableCacheShard.mu in order to avoid a race with Close()
Moreover, it seems as if we don't really do anything with the error returned from d.opts.Cleaner.Clean in d.deleteObsoleteFile. - The error gets logged here: https://github.com/cockroachdb/pebble/blob/master/compaction.go#L2872, but we don't do anything to rectify the error.
Yeah, there is a TODO about this:
// TODO(peter): need to handle this error, probably by re-adding the
// file that couldn't be deleted to one of the obsolete slices map.
err := d.opts.Cleaner.Clean(d.opts.FS, fileType, path)
if oserror.IsNotExist(err) {
return
}
The suggestion in the TODO seems reasonable. The thinking on why this wasn't urgently addressed is that on startup we'll capture any obsolete file that wasn't previously deleted. Still, it would be worthwhile to address the TODO.
@petermattis
In tableCacheShard.evict
, we only call v.release
if v != nil
. Unfortunately, in the code path where we call tableCacheShard.addNode
-> tableCacheShard.evictNodes
-> tableCacheShard.runHandCold
to tableCacheShard.clearNode
-> tableCacheShard.unrefValue
, we end up adding the value
to the releasingCh
, and setting the value to nil
.
When the file is removed later through a call to doDeleteObsoleteFiles
, the releaseLoop
hasn't necessarily closed the file yet.
The reason the evict
function call doesn't end up calling release
on the value
is because it has already been set to nil
.
@bananabrick Ah, that's a great find. I'm not sure what to do here. We may need a secondary map of evicted sstables that we're waiting to close. Even worse, if you evict and re-open and the evict, there may be multiple file handles opened for the same sstable simultaneously. Not really a problem except that we need to wait for all of them to close. Note that the MemFS
behavior is meant to emulate the Windows behavior. On Unix it isn't a problem to delete a file that is still open.
It seems like windows support isn't a priority, but we do have some tests which seem to rely on us closing files. I've run into similar issues while trying to write more tests for the shared table cache.
I can look into this more carefully next week.
Windows support isn't a huge priority, but Pebble does make an attempt at it. This bug is currently the only one I'm aware of with regards to Windows.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to Pebble!
The
tableCache
on master has a release loop which closes files, https://github.com/cockroachdb/pebble/blob/master/table_cache.go#L173. But it doesn't seem like we wait for a file to close before removing it.MemFS
keeps track of reference counts for the files which is decremented onMemFS.Close
.MemFS.Remove
errors out if we try to remove a file with aref count > 0
.releaseLoop
function in the table cache only attempts to close a file, afterMemFS.Remove
has been called. This prevents the file from being deleted.d.opts.Cleaner.Clean
ind.deleteObsoleteFile
. - The error gets logged here: https://github.com/cockroachdb/pebble/blob/master/compaction.go#L2872, but we don't do anything to rectify the error.Jira issue: PEBBLE-220