elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.23k stars 24.85k forks source link

SourceOnlySnapshotRepository is Leaking Files #50231

Open original-brownbear opened 4 years ago

original-brownbear commented 4 years ago

When using SourceOnlySnapshotRepository a _snapshot directory is created in each snapshotted shard's data path. The contents of this path directory are never cleaned up it seems.

I tried fixing this via a trivial fix:

diff --git a/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java b/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
index af9d95a519a..5b46d786f7f 100644
--- a/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
+++ b/x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/SourceOnlySnapshotRepository.java
@@ -134,10 +134,11 @@ public final class SourceOnlySnapshotRepository extends FilterRepository {
         Path dataPath = ((FSDirectory) unwrap).getDirectory().getParent();
         // TODO should we have a snapshot tmp directory per shard that is maintained by the system?
         Path snapPath = dataPath.resolve(SNAPSHOT_DIR_NAME);
-        final List<Closeable> toClose = new ArrayList<>(3);
+        final List<Closeable> toClose = new ArrayList<>(4);
         try {
             FSDirectory directory = new SimpleFSDirectory(snapPath);
             toClose.add(directory);
+            toClose.add(() -> IOUtils.rm(snapPath));
             Store tempStore = new Store(store.shardId(), store.indexSettings(), directory, new ShardLock(store.shardId()) {
                 @Override
                 protected void closeInternal() {

but this seems to severely alter the incrementality properties of the source only snapshotting (tests fails because the expected file counts change). This is kind of obvious from the way it works, but shouldn't we at least delete these _snapshot folders when removing a source only repository? Also, shouldn't we document the fact that using a source only repository may require significant disk space to track the source_only on disk index used to make things incremental?

Originally reported in https://discuss.elastic.co/t/large-snapshot-directories-on-disk/211988

elasticmachine commented 4 years ago

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)