apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.08k stars 445 forks source link

Failed compaction file cleanup error when tablet does not exist in volume #5087

Closed keith-turner closed 4 days ago

keith-turner commented 1 week ago

Describe the bug

When the manager detects a failed compaction it attempts to clean up any files the compaction may have been writing. To do this it looks for a file w/ the compaction uuid in the tablet diretcory in all volumes. If a tablet has never written to a volume it will not have a directory there. When a table does not have a directory in a volume this code fails with a FileNotFoundException.

When the FileNotFoundException happens this causes the search through the volumes to stop. It also causes a noisy error log message that is not useful.

To Reproduce

Kill a compaction on tablet in an accumulo system where the tablet does not have dirs on all volumes in the system.

Expected behavior

When a tablet does not have a directory in a volume the search for dead compaction files continues and logs no errors. Could log a trace message.

keith-turner commented 1 week ago

May be able to fix this with a change like the following. However need to do more research on the fs.listStatus failure modes to be sure.

                    FileStatus[] files;
                   try{
                       files = fs.listStatus(new Path(volPath), (path) -> {
                        return path.getName().endsWith(fileSuffix);
                      });
                    }(catch FileNotFoundException e){
                      log.trace("failed to list tablet dir {}", volPath, e);
                      files = new FilesStatus[0];
                   }
cshannon commented 4 days ago

I took a look at the API and docs and it looks like catching that exception should be all we need to do to handle this.

Also, i found a couple spots in the code where variations of this method are called and we are already catching FileNotFoundException such as here.