apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.07k stars 445 forks source link

TableOperationsIT diskUsage failure #2428

Closed milleruntime closed 2 years ago

milleruntime commented 2 years ago

Changes in 23fa7d48e7908b9af0a761fda431ffa0fc472a12 have caused a failure in the getDiskUsage test of TableOperationsIT. Here is where the error is happening in the test:

[ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 116.32 s <<< FAILURE! - in org.apache.accumulo.test.TableOperationsIT
[ERROR] org.apache.accumulo.test.TableOperationsIT.getDiskUsage  Time elapsed: 19.208 s  <<< ERROR!
org.apache.accumulo.core.client.AccumuloException: org.apache.thrift.TApplicationException: Internal error processing getDiskUsage
    at org.apache.accumulo.core.clientImpl.TableOperationsImpl.getDiskUsage(TableOperationsImpl.java:1492)
    at org.apache.accumulo.test.TableOperationsIT.getDiskUsage(TableOperationsIT.java:158)

Here is the error that is being thrown in the tablet server:

org.apache.thrift.TException: java.io.FileNotFoundException: File file:/accumulo/test/target/mini-tests/org.apache.accumulo.test.TableOperationsIT_getDis
kUsage/accumulo/tables/2 does not exist
        at org.apache.accumulo.server.client.ClientServiceHandler.getDiskUsage(ClientServiceHandler.java:448) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$1(TraceUtil.java:197) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at com.sun.proxy.$Proxy28.getDiskUsage(Unknown Source) ~[?:?]
        at org.apache.accumulo.core.clientImpl.thrift.ClientService$Processor$getDiskUsage.getResult(ClientService.java:2431) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.accumulo.core.clientImpl.thrift.ClientService$Processor$getDiskUsage.getResult(ClientService.java:2410) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.15.0.jar:0.15.0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.15.0.jar:0.15.0]
        at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) ~[libthrift-0.15.0.jar:0.15.0]
        at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:116) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.15.0.jar:0.15.0]
        at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) ~[opentelemetry-context-1.7.1.jar:1.7.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at io.opentelemetry.context.Context.lambda$wrap$1(Context.java:207) ~[opentelemetry-context-1.7.1.jar:1.7.1]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.io.FileNotFoundException: File file:/accumulo/test/target/mini-tests/org.apache.accumulo.test.TableOperationsIT_getDiskUsage/accumulo/tables/2 does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:491) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1941) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1983) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:2149) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2148) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:741) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2255) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2252) ~[hadoop-client-api-3.3.0.jar:?]
        at org.apache.accumulo.server.fs.VolumeManagerImpl.listFiles(VolumeManagerImpl.java:269) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.accumulo.server.util.TableDiskUsage.getDiskUsage(TableDiskUsage.java:218) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
        at org.apache.accumulo.server.client.ClientServiceHandler.getDiskUsage(ClientServiceHandler.java:440) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
ctubbsii commented 2 years ago

One issue seems to be that it's not prefixing the filesystem root directory for RawLocalFileSystem correctly. So, it's checking for a file relative to the root of the local filesystem, which is incorrect. This might have gone unnoticed with the globbing, because the glob never matched anything, whereas the current code now has a starting point in a directory it expects to exist.

A second issue is that the directory might not exist on all volumes, so we might need to have it ignore the case when there are no files on that specific volume. My concern doing this here, though, is that it might mask the first issue above, which should be fixed first.

milleruntime commented 2 years ago

I am not sure if the prefixing is causing the failure but the command works a few times in the IT up until it clones the table. After the clone, then it fails to find the second tableId. So there may be something else going on.

milleruntime commented 2 years ago

It looks like there is a change in the way that TableDiskUsage behaves with the newly created table that has yet to be compacted. Before the recent change, the test would run the command (after cloning the table but before compacting) and only return results for one table:

diskUsages = accumuloClient.tableOperations().getDiskUsage(tables);
assertEquals(1, diskUsages.size());
assertEquals(2, diskUsages.get(0).getTables().size());
assertTrue(diskUsages.get(0).getUsage() > 0);

So the files for the new table aren't there yet, until it gets compacted. Then when the command is run, it will return results for both tables.

Since the recent changes to TableDiskUsage, an exception is now thrown because the file isn't found, instead of just quietly failing as it did before. The file isn't there yet because compaction hasn't run.

I am not sure if this is a problem, since it only seems to happen in the mini test with the RawFileSystem. I ran similar commands in the shell in Uno and it works fine. Do we want to change the behavior to make the test pass? Or change the test to pass for the new behavior?

EdColeman commented 2 years ago

If the file (or it appears directory in this case) does not exist, would it be appropriate to return a size of 0? Which, I think is your change the behavior option?

milleruntime commented 2 years ago

I am not sure if this is a problem, since it only seems to happen in the mini test with the RawFileSystem. I ran similar commands in the shell in Uno and it works fine.

I was mistaken. The command fails in Uno as well. Bottom line is that it doesn't handle clones. I think it might be a bug.

milleruntime commented 2 years ago

If the file (or it appears directory in this case) does not exist, would it be appropriate to return a size of 0? Which, I think is your change the behavior option?

I think that is one possible solution. The problem is I don't fully understand the code in TableDiskUsage that tracks different sets of IDs so I am not sure where we need to make an adjustment.