Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

Alluxio crashed and failed to start: Fatal error: Unexpected journal entry. #12667

Closed Trevor-zhang closed 3 years ago

Trevor-zhang commented 3 years ago

Alluxio Version: 2.2.0 Describe the bug Due to the expansion of the hadoop cluster and then performing hdfs data balancing, alluxio was down and could not be started To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

Trevor-zhang commented 3 years ago

The UFS of our alluxio is HDFS. On Friday, the cluster was expanded horizontally, and data balance was performed after the expansion. At 0:20 on Saturday, we found alluxio hung up.

java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:20:04,843 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:20:04,843 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
2020-12-19 00:20:24,759 WARN  SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:24,759 WARN  SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,631 INFO  BackupLeaderRole - Closing backup-leader role.
2020-12-19 00:27:43,640 INFO  AlluxioMasterProcess - Alluxio master ended (lost leadership)
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
    at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
    at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
    at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
    at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
    at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:28:53,212 INFO  BlockWorkerClientPool - Block worker client for GrpcServerAddress{HostName=xx-gray-alhzgz-ecs-dtpbu-data-lake-12, SocketAddress=xx-gray-alhzgz-ecs-dtpbu-data-lake-12/xx.0.74.57:29999} closed.
2020-12-19 00:28:53,222 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 3 attempts
    at alluxio.AbstractClient.connect(AbstractClient.java:274)
    at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
    at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
    at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
    at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
    at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=99d6f5cf-15cc-4d25-b8a8-7de0a0a1ceb2}
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
    at alluxio.AbstractClient.connect(AbstractClient.java:227)
    ... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
    at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
    at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
    at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
    at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
    ... 11 more
Trevor-zhang commented 3 years ago

Later, we hope to pull up the alluxio service. We executed alluxio-start.sh master. But it was wrong:

--- [ FAILED ] The master @ prod-xx-data-lake-002 is not serving requests.
--- Printing the log tail for /opt/alluxio/logs/master.log
>>> BEGIN
2020-12-19 15:17:52,430 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,437 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,440 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:17:53,107 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 15:17:53,109 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:18:04,986 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:18:23,541 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO  CopycatServer - Server started successfully!
2020-12-19 15:18:23,542 INFO  RaftJournalSystem - Started Raft Journal System in 48781ms
2020-12-19 15:18:23,543 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 15:18:23,544 INFO  AlluxioMasterProcess - All masters started
2020-12-19 15:18:23,544 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:18:23,544 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:18:23,548 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:18:23,552 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
<<< EOF
Trevor-zhang commented 3 years ago

This is the master.log information from the occurrence of the exception to the present.

2020-12-19 00:18:09,881 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:18:09,881 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:18:09,881 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:18:09,881 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:18:15,073 WARN  SleepingTimer - Master Active UFS Sync last execution took 39523 ms. Longer than the interval 30000
2020-12-19 00:18:15,073 WARN  SleepingTimer - Master Active UFS Sync last execution took 39523 ms. Longer than the interval 30000
2020-12-19 00:18:46,082 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:19:31,497 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837335359BE
[HostId]: null
2020-12-19 00:19:31,574 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733F959BE
[HostId]: null
2020-12-19 00:19:31,597 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837332A5ABE
[HostId]: null
2020-12-19 00:19:31,601 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733305ABE
[HostId]: null
2020-12-19 00:19:31,605 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764862-8223680_20201219001650.parquet does not exist
2020-12-19 00:19:31,6xx INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837333B5ABE
[HostId]: null
2020-12-19 00:19:31,614 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733465ABE
[HostId]: null
2020-12-19 00:19:40,189 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764580-8222860_20201219001551.parquet does not exist
2020-12-19 00:19:40,197 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733BFAFBE
[HostId]: null
2020-12-19 00:19:40,200 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733D8AFBE
[HostId]: null
2020-12-19 00:19:40,205 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764780-8223440_20201219001634.parquet does not exist
2020-12-19 00:19:40,209 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733F3AFBE
[HostId]: null
2020-12-19 00:19:40,216 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733FBAFBE
[HostId]: null
2020-12-19 00:19:40,221 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764620-8222976_20201219001600.parquet does not exist
2020-12-19 00:19:40,225 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837331EB0BE
[HostId]: null
2020-12-19 00:19:40,232 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373329B0BE
[HostId]: null
2020-12-19 00:19:40,236 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764540-8222744_20201219001544.parquet does not exist
2020-12-19 00:19:40,242 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373345B0BE
[HostId]: null
2020-12-19 00:19:40,247 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373351B0BE
[HostId]: null
2020-12-19 00:19:40,254 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764740-8223324_20201219001627.parquet does not exist
2020-12-19 00:19:40,258 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373377B0BE
[HostId]: null
2020-12-19 00:19:40,262 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373382B0BE
[HostId]: null
2020-12-19 00:19:40,266 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764700-8223208_20201219001620.parquet does not exist
2020-12-19 00:19:40,270 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837338EB0BE
[HostId]: null
2020-12-19 00:19:40,273 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373393B0BE
[HostId]: null
2020-12-19 00:19:40,278 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764902-8223796_20201219001657.parquet does not exist
2020-12-19 00:19:40,282 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733A5B0BE
[HostId]: null
2020-12-19 00:19:40,288 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733ABB0BE
[HostId]: null
2020-12-19 00:19:40,292 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764660-8223092_20201219001611.parquet does not exist
2020-12-19 00:19:40,298 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733C9B0BE
[HostId]: null
2020-12-19 00:19:40,302 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733D8B0BE
[HostId]: null
2020-12-19 00:19:40,306 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764822-8223564_20201219001642.parquet does not exist
2020-12-19 00:19:40,311 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733EBB0BE
[HostId]: null
2020-12-19 00:19:40,315 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733F6B0BE
[HostId]: null
2020-12-19 00:19:40,321 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764942-8223912_20201219001717.parquet does not exist
2020-12-19 00:19:40,326 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373318B1BE
[HostId]: null
2020-12-19 00:19:40,333 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837332DB1BE
[HostId]: null
2020-12-19 00:20:04,843 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:20:04,843 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:20:04,843 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:20:24,759 WARN  SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:24,759 WARN  SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:39,769 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733BF7AC0
[HostId]: null
2020-12-19 00:20:39,796 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733FE7AC0
[HostId]: null
2020-12-19 00:20:39,800 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837330B7BC0
[HostId]: null
2020-12-19 00:20:39,805 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765062-8224260_20201219001752.parquet does not exist
2020-12-19 00:20:39,809 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837331C7BC0
[HostId]: null
2020-12-19 00:20:39,813 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733237BC0
[HostId]: null
2020-12-19 00:20:39,817 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765226-8224746_20201219001822.parquet does not exist
2020-12-19 00:20:39,822 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837332F7BC0
[HostId]: null
2020-12-19 00:20:39,825 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733367BC0
[HostId]: null
2020-12-19 00:20:39,830 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765186-8224630_20201219001814.parquet does not exist
2020-12-19 00:20:39,834 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837334C7BC0
[HostId]: null
2020-12-19 00:20:39,838 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733597BC0
[HostId]: null
2020-12-19 00:20:39,842 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765xx2-8224376_20201219001800.parquet does not exist
2020-12-19 00:20:39,846 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837336B7BC0
[HostId]: null
2020-12-19 00:20:39,850 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733777BC0
[HostId]: null
2020-12-19 00:20:39,854 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765146-8224514_20201219001807.parquet does not exist
2020-12-19 00:20:39,858 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837338A7BC0
[HostId]: null
2020-12-19 00:20:39,866 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731BF5861
[HostId]: null
2020-12-19 00:20:39,867 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837339B7BC0
[HostId]: null
2020-12-19 00:20:39,870 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731C85861
[HostId]: null
2020-12-19 00:20:39,874 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731D05861
[HostId]: null
2020-12-19 00:21:13,478 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:21:13,479 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:21:13,478 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:21:24,173 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7041B3198373329CAC1
[HostId]: null
2020-12-19 00:21:38,388 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733D63CC2
[HostId]: null
2020-12-19 00:21:38,393 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733DF3CC2
[HostId]: null
2020-12-19 00:21:38,398 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765348-8225xx2_20201219001844.parquet does not exist
2020-12-19 00:21:38,402 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733F93CC2
[HostId]: null
2020-12-19 00:21:38,408 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733FD3CC2
[HostId]: null
2020-12-19 00:21:38,412 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765308-8224986_20201219001837.parquet does not exist
2020-12-19 00:21:38,416 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837331C3DC2
[HostId]: null
2020-12-19 00:21:38,420 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733253DC2
[HostId]: null
2020-12-19 00:21:38,425 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765388-8225218_20201219001905.parquet does not exist
2020-12-19 00:21:38,429 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837333C3DC2
[HostId]: null
2020-12-19 00:21:38,434 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733433DC2
[HostId]: null
2020-12-19 00:21:38,438 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765428-8225334_20201219001925.parquet does not exist
2020-12-19 00:21:38,442 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733593DC2
[HostId]: null
2020-12-19 00:21:38,446 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733623DC2
[HostId]: null
2020-12-19 00:21:38,450 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765266-8224862_20201219001830.parquet does not exist
2020-12-19 00:21:38,453 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837337B3DC2
[HostId]: null
2020-12-19 00:21:38,455 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731C81463
[HostId]: null
2020-12-19 00:21:38,458 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD71284C2A9313542AD5E
[HostId]: null
2020-12-19 00:21:38,462 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD71284C2A931354AAD5E
[HostId]: null
2020-12-19 00:21:38,463 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733913DC2
[HostId]: null
2020-12-19 00:21:38,463 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731CE1463
[HostId]: null
2020-12-19 00:21:38,473 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731F31463
[HostId]: null
2020-12-19 00:21:38,477 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE2373xx01563
[HostId]: null
2020-12-19 00:21:38,483 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733AB3DC2
[HostId]: null
2020-12-19 00:21:38,493 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733CC3DC2
[HostId]: null
2020-12-19 00:22:37,520 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B319837333AE6C3
[HostId]: null
2020-12-19 00:22:37,527 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B3198373343E6C3
[HostId]: null
2020-12-19 00:22:37,531 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B3198373354E6C3
[HostId]: null
2020-12-19 00:22:37,535 WARN  ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/full/xx_biz_operation_t_city/20201xx5/2021/03/18 does not exist
2020-12-19 00:22:37,538 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D84C2A93135AF6560
[HostId]: null
2020-12-19 00:22:37,623 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D84C2A93135736660
[HostId]: null
2020-12-19 00:22:42,127 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD75284C2A93135D88760
[HostId]: null
2020-12-19 00:22:42,201 INFO  oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD75284C2A93135808860
[HostId]: null
2020-12-19 00:22:56,741 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:22:56,741 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:22:56,741 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:23:31,317 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:23:31,317 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:23:31,317 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:05,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:05,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:05,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:35,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:35,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:24:35,448 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:15,268 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:15,268 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:15,268 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:45,621 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:45,622 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:25:45,622 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:19,131 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:19,131 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:19,131 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:51,783 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:51,783 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:26:51,783 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:27:35,630 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:27:35,630 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:27:35,630 WARN  MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
    at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
    at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
    ... 2 more
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,631 INFO  BackupLeaderRole - Closing backup-leader role.
2020-12-19 00:27:43,640 INFO  AlluxioMasterProcess - Alluxio master ended (lost leadership)
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
    at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
    at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
    at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
    at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
    at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:28:53,212 INFO  BlockWorkerClientPool - Block worker client for GrpcServerAddress{HostName=xx-gray-alhzgz-ecs-dtpbu-data-lake-12, SocketAddress=xx-gray-alhzgz-ecs-dtpbu-data-lake-12/xx.0.74.57:29999} closed.
2020-12-19 00:28:53,222 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 3 attempts
    at alluxio.AbstractClient.connect(AbstractClient.java:274)
    at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
    at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
    at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
    at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
    at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=99d6f5cf-15cc-4d25-b8a8-7de0a0a1ceb2}
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
    at alluxio.AbstractClient.connect(AbstractClient.java:227)
    ... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
    at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
    at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
    at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
    at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
    ... 11 more
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
    at io.grpc.Status.asRuntimeException(Status.java:533)
    at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
    at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    ... 3 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: prod-xx-data-lake-002/xx.0.74.34:19998
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
    at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
    at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
    at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:xx44)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:30:03,609 INFO  ServerConnector - Stopped ServerConnector@49c992f4{HTTP/1.1}{xx.0.74.34:19999}
2020-12-19 00:30:03,616 INFO  ContextHandler - Stopped o.e.j.s.ServletContextHandler@61b4889b{/,file:/opt/alluxio/webui/master/build/,UNAVAILABLE}
2020-12-19 00:30:03,617 INFO  ContextHandler - Stopped o.e.j.s.ServletContextHandler@27998699{/metrics/json,null,UNAVAILABLE}
2020-12-19 00:30:03,618 INFO  ContextHandler - Stopped o.e.j.s.ServletContextHandler@7bd638d4{/metrics/prometheus,null,UNAVAILABLE}
2020-12-19 00:30:03,620 INFO  AbstractMaster - TableMaster: Stopping primary master.
2020-12-19 00:30:03,620 INFO  AbstractMaster - TableMaster: Stopped primary master.
2020-12-19 00:30:03,620 INFO  AbstractMaster - MetaMaster: Stopping primary master.
2020-12-19 00:30:03,620 INFO  AbstractMaster - MetaMaster: Stopped primary master.
2020-12-19 00:30:14,706 INFO  AbstractMaster - FileSystemMaster: Stopping primary master.
2020-12-19 00:30:14,708 INFO  AbstractMaster - FileSystemMaster: Stopped primary master.
2020-12-19 00:30:14,708 INFO  AbstractMaster - BlockMaster: Stopping primary master.
2020-12-19 00:30:14,708 INFO  AbstractMaster - BlockMaster: Stopped primary master.
2020-12-19 00:30:14,708 INFO  AbstractMaster - MetricsMaster: Stopping primary master.
2020-12-19 00:30:14,7xx INFO  AbstractMaster - MetricsMaster: Stopped primary master.
2020-12-19 00:30:29,661 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 9 attempts
    at alluxio.AbstractClient.connect(AbstractClient.java:274)
    at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
    at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
    at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
    at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
    at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=1aa4c8ac-a651-41b7-8262-bdbe53d30xxb}
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
    at alluxio.AbstractClient.connect(AbstractClient.java:227)
    ... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
    at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
    at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
    at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
    at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
    at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
    at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
    ... 11 more
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
    at io.grpc.Status.asRuntimeException(Status.java:533)
    at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
    at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
    ... 3 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: prod-xx-data-lake-002/xx.0.74.34:19998
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
    at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
    at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
    at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:xx44)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:30:29,662 WARN  SleepingTimer - Meta Master Config Hash Sync-app-8362670081503149240 last execution took 128666 ms. Longer than the interval 60000
2020-12-19 00:30:30,652 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 00:30:30,652 INFO  RaftJournalWriter - Closing journal writer. Last sequence numbers written/submitted/committed: 9796730/9796730/9796730
2020-12-19 00:30:38,742 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 00:30:38,745 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 00:30:38,745 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 00:30:38,892 INFO  RaftJournalSystem - Journal shutdown complete
2020-12-19 13:12:24,799 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 13:12:24,799 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 13:12:24,799 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 13:12:24,799 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 13:12:24,799 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 13:12:24,852 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:12:24,856 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-1383991486092514144. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:12:24,920 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 13:12:24,923 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:12:24,954 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:12:24,985 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:12:24,986 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:12:25,000 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:12:25,001 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:12:25,111 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:12:25,709 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:12:25,719 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:12:25,902 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:12:25,902 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:12:25,911 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 13:12:34,646 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:12:34,647 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:12:34,794 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:12:34,796 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:12:34,804 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:12:35,059 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:12:51,876 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:12:51,878 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:12:51,886 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:12:51,889 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:12:51,952 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:12:51,960 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,977 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,977 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,982 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO  JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:12:52,720 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 13:12:52,723 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:13:05,078 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:13:22,596 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:13:22,597 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:13:22,597 INFO  CopycatServer - Server started successfully!
2020-12-19 13:13:22,597 INFO  RaftJournalSystem - Started Raft Journal System in 47950ms
2020-12-19 13:13:22,598 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 13:13:22,600 INFO  AlluxioMasterProcess - All masters started
2020-12-19 13:13:22,600 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:13:22,600 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:13:22,604 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:13:22,608 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:35,025 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 13:33:35,025 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 13:33:35,025 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 13:33:35,025 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 13:33:35,025 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 13:33:35,080 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:33:35,084 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-546653125117181xx91. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:33:35,134 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 13:33:35,151 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:33:35,182 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:33:35,195 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:33:35,196 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:33:35,209 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:33:35,211 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:33:35,346 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:33:35,894 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:33:35,903 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:33:36,077 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:33:36,077 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:33:36,084 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 13:33:44,994 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:44,995 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:33:45,136 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:33:45,137 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:33:45,145 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:45,290 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:33:55,200 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:33:55,202 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:33:55,215 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:33:55,218 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:33:55,282 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:33:55,290 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,308 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,308 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,313 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,314 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO  JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:33:55,954 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 13:33:55,956 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:33:55,956 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:33:55,957 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:33:55,958 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:33:55,958 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:34:22,262 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:34:24,544 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:34:24,545 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:34:24,545 INFO  CopycatServer - Server started successfully!
2020-12-19 13:34:24,545 INFO  RaftJournalSystem - Started Raft Journal System in 39550ms
2020-12-19 13:34:24,546 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 13:34:24,547 INFO  AlluxioMasterProcess - All masters started
2020-12-19 13:34:24,548 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:34:24,548 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:34:24,551 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:34:24,555 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:36,297 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 13:43:36,297 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 13:43:36,297 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 13:43:36,297 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 13:43:36,297 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 13:43:36,347 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:43:36,351 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-835096659429914040. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:43:36,405 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 13:43:36,422 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:43:36,451 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:43:36,465 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:43:36,466 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:43:36,483 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:43:36,485 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:43:36,606 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:43:37,173 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:43:37,182 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:43:37,350 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:43:37,350 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:43:37,357 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 13:43:46,475 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:46,476 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:43:46,572 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:43:46,573 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:43:46,586 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:46,624 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:44:06,224 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:44:06,226 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:44:06,236 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:44:06,239 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:44:06,300 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:44:06,308 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,325 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,325 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,330 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO  JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:44:07,078 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 13:44:07,084 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:44:18,996 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:44:35,xx0 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:44:35,xx0 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:44:35,xx0 INFO  CopycatServer - Server started successfully!
2020-12-19 13:44:35,xx0 INFO  RaftJournalSystem - Started Raft Journal System in 48624ms
2020-12-19 13:44:35,xx1 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 13:44:35,xx3 INFO  AlluxioMasterProcess - All masters started
2020-12-19 13:44:35,xx3 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:44:35,xx3 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:44:35,xx7 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:44:35,1xx INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:15,055 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 14:57:15,055 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 14:57:15,055 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 14:57:15,055 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 14:57:15,055 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 14:57:15,xx4 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 14:57:15,xx7 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-4583728940563134304. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 14:57:15,159 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 14:57:15,184 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 14:57:15,216 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 14:57:15,217 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 14:57:15,217 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 14:57:15,231 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 14:57:15,232 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 14:57:15,384 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 14:57:15,914 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 14:57:15,923 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 14:57:16,057 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 14:57:16,058 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 14:57:16,064 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 14:57:25,038 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:25,039 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 14:57:25,180 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 14:57:25,181 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 14:57:25,189 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:25,336 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 14:57:39,475 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 14:57:39,478 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 14:57:39,485 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 14:57:39,489 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 14:57:39,554 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 14:57:39,561 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,578 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,578 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,584 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,585 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO  JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 14:57:40,284 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 14:57:40,290 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 14:57:55,358 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 14:58:12,431 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 14:58:12,431 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 14:58:12,432 INFO  CopycatServer - Server started successfully!
2020-12-19 14:58:12,432 INFO  RaftJournalSystem - Started Raft Journal System in 47393ms
2020-12-19 14:58:12,433 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 14:58:12,434 INFO  AlluxioMasterProcess - All masters started
2020-12-19 14:58:12,434 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 14:58:12,434 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 14:58:12,438 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 14:58:12,442 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:21,482 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 15:06:21,482 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 15:06:21,482 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 15:06:21,482 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 15:06:21,482 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 15:06:21,535 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 15:06:21,539 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-3184643968898944169. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 15:06:21,592 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 15:06:21,607 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 15:06:21,637 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 15:06:21,653 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 15:06:21,654 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 15:06:21,667 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 15:06:21,669 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 15:06:21,806 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 15:06:22,363 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 15:06:22,372 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 15:06:22,535 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:06:22,535 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 15:06:22,542 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 15:06:31,617 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:31,618 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 15:06:31,727 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 15:06:31,728 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 15:06:31,736 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:31,799 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:06:45,799 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 15:06:45,802 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 15:06:45,8xx INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 15:06:45,813 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 15:06:45,875 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 15:06:45,884 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,900 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,900 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,905 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,907 INFO  JournalUtils - Reading journal entries
2020-12-19 15:06:45,907 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:06:46,598 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 15:06:46,600 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:06:46,601 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:06:46,602 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:06:46,603 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:06:46,604 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:07:01,819 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:07:17,275 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:07:17,276 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:07:17,276 INFO  CopycatServer - Server started successfully!
2020-12-19 15:07:17,276 INFO  RaftJournalSystem - Started Raft Journal System in 45658ms
2020-12-19 15:07:17,277 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 15:07:17,279 INFO  AlluxioMasterProcess - All masters started
2020-12-19 15:07:17,279 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:07:17,279 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:07:17,282 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:07:17,288 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:24,574 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-12-19 15:17:24,574 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-12-19 15:17:24,574 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-12-19 15:17:24,574 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-12-19 15:17:24,574 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-12-19 15:17:24,629 INFO  MetricsSystem - Starting sinks with config: {}.
2020-12-19 15:17:24,633 INFO  MetricsHeartbeatContext - Created metrics heartbeat with ID app-7973828500878036450. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 15:17:24,682 INFO  NettyUtils - EPOLL_MODE is available
2020-12-19 15:17:24,700 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 15:17:24,730 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 15:17:24,740 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 15:17:24,740 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 15:17:24,753 INFO  UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 15:17:24,755 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-12-19 15:17:24,892 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 15:17:25,430 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 15:17:25,439 INFO  HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 15:17:25,576 INFO  ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:17:25,576 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-12-19 15:17:25,584 INFO  JournalStateMachine - Initialized new journal state machine
2020-12-19 15:17:34,760 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:34,761 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 15:17:34,859 INFO  GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 15:17:34,863 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 15:17:34,875 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:34,909 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:17:52,318 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 15:17:52,321 INFO  CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 15:17:52,332 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 15:17:52,335 INFO  ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 15:17:52,399 INFO  ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 15:17:52,409 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,437 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO  JournalUtils - Reading journal entries
2020-12-19 15:17:52,440 INFO  JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:17:53,xx7 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
  id: 0
  direct_children_loaded: true
}

2020-12-19 15:17:53,xx9 INFO  AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO  AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO  AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO  AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO  AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:18:04,986 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:18:23,541 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO  AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO  CopycatServer - Server started successfully!
2020-12-19 15:18:23,542 INFO  RaftJournalSystem - Started Raft Journal System in 48781ms
2020-12-19 15:18:23,543 INFO  RaftJournalSystem - Shutting down raft journal
2020-12-19 15:18:23,544 INFO  AlluxioMasterProcess - All masters started
2020-12-19 15:18:23,544 INFO  FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:18:23,544 INFO  ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:18:23,548 INFO  ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:18:23,552 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
Trevor-zhang commented 3 years ago

@JySongWithZhangCe @jiacheliu3 Can you help me find out what is the cause of the problem? Is it possible that Alluxio is abnormal due to HDFS balancing data?

jiacheliu3 commented 3 years ago

@Trevor-zhang You have attached much logs, I only made a quick pass so might be missing something.

Seems Alluxio is running with Active Sync, where updates in the HDFS will be streamed into Alluxio. You have a lot of operations going on in HDFS, and I expect a lot of messages to be streaming into Alluxio. Then somehow this affected journal flushing. Then you see an OOM.

2020-12-19 00:20:24,759 WARN  SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
...
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
    at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
    at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
    at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
    at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
    at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
jiacheliu3 commented 3 years ago

@Trevor-zhang I would start with looking at the existing journal, ls -alR ${ALLUXIO_HOME}/journal and see if the journal files still make sense.

Then translate the journal entries into human readable format bin/alluxio readJournal -start <start index> -end <end index>. This parses the journal entries in the range into a txt file containing what happened. If you identify what was the journal entry that failed to flush, now you can see what that was doing.

Let's see what you find

jiacheliu3 commented 3 years ago

@Trevor-zhang There are multiple things we can do from here:

  1. If you have XX:+HeapDumpOnOutOfMemoryError, then you can analyze what consumed most memory from the dump
  2. If you have the GC log, analyze that and understand the memory pattern
  3. If this is production, and you want the service back ASAP, BACKUP THE JOURNAL ON ALL MASTERS and then I believe you can format the journal. This should be the easiest way to "fix it". The backup files can be used for your investigation.
  4. If a journal file is indeed corrupted, maybe your alluxio readJournal fails. I guess you can just throw away that file and rename the last journal file. For example if you have
    ./0x001-0x100
    ./0x100-0xfff <- you find this is corrupted

    Then I believe you can

    ./0x001-0xfff <- rename this file
    ./0x100-0xfff <- delete this

    The master should be able to start with what it has, then the raft quorum will update it with what it is missing. @LuQQiu could you confirm if my guess is right?

Trevor-zhang commented 3 years ago

@jiacheliu3 The metadata from the journal directory up to before the crash exists. In edit.txt I found a lot of pipeline information synchronized with OSS. But I still haven’t found the cause of Alluxio crashed.

I am curious why there is no way to restart the service except format journal after the crash, even if the whole environment is stable when it is started.

jiacheliu3 commented 3 years ago

@Trevor-zhang Could you share your investigation on the journal files with us? What are the last edits look like and the journal files too?

I'm not sure what you mean by "even if the whole environment is stable when it started"? If there's OOM, it's hard to guarantee anything, a lot of behaviors will be undefined. Unfortunately, based on the information you provided I'm not able to better answer this question. If you share more information maybe I can help more.

If the journal looks corrupted, formatting journal will be the easiest way and fastest way to bring your service back up. Alternatively, you can try the manual way I suggested as well. When there is journal, Alluxio recovers from the journal. It does not matter whether your environment (UFS) is stable or not. And this does not work when your journal is corrupted, of course.

Trevor-zhang commented 3 years ago

@jiacheliu3 Thank you for your help. Because this is the production environment, the journal log is on the computer of the operation and maintenance personnel , and I don't expect to get the file until Monday.

I read journal yesterday, the file are all in good condition, and I have recovered alluxio. I want to locate the cause of this failure and sum up the lessons, so as not to have such a problem again next time. :) :)

jiacheliu3 commented 3 years ago

@Trevor-zhang I have the same opinion. We should investigate on the journal files and figure out the issue, so we understand better what to do next time.

Please share the file and how the journal files look like (what are the file names? how big are they?) with us. It's going to be very helpful. Thanks!

Trevor-zhang commented 3 years ago

@jiacheliu3 i run /alluxio readJournal -start 2000000 -end 8000000 . The following files are generated, and there are about 1.7G files in edits.txt.

5c025ca282d0347e6e85532d514bbf8

In edits.txt with the following file:

sequence_number: 7999996
delete_file {
  id: 23631480487935
  recursive: false
  op_time_ms: 1607450451368
  path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209014700.clean"
}

sequence_number: 7999997
delete_file {
  id: 23631564374015
  recursive: false
  op_time_ms: 1607450451380
  path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209014700.inflight"
}

sequence_number: 7999998
delete_file {
  id: 23633292427263
  recursive: false
  op_time_ms: 1607450451391
  path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209015024.clean"
}

sequence_number: 7999999
delete_file {
  id: 23633577639935
  recursive: false
  op_time_ms: 1607450451403
  path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209015043.inflight"
}

Because the file is too large, how can I filter the content based on certain conditions ?

jiacheliu3 commented 3 years ago

@Trevor-zhang Can you attach what you see doing ls -alR ${ALLUXIO_HOME}/journal? That should help us understand what journal entries were in the last journal file, which probably failed flushing. I think you only need to check the entries involved in the last file, and most probably only the last few entries were affected.

Also if you have other masters, I recommend you to check the same entries on the other masters too. You will be able to see what entries were missing on the standby masters.

Trevor-zhang commented 3 years ago

@jiacheliu3 Sorry, I just learned from O&M that the last journal file is no longer found. The O&M executed format journal. Now there is only the edit.txt file of that period. Can we start with edit.txt ?

jiacheliu3 commented 3 years ago

@Trevor-zhang I assume you checked the index on the journal files and found the last entries to be 8000000. From the attached piece of your edit.txt, I can see the last entries are delete_file operations. This operation should be small and fast, I don't expect them to cause you journal issues. From that perspective, I don't think the edit.txt have any more information to us.

Because there's no journal anymore, we can no longer check the following:

  1. If the last journal entries are corrupted, compared to what we see from edit.txt
  2. What journal entries were flushing, as the OOM happen
  3. If you have other masters, what journal entries they are missing, and guess what entries were flushing
  4. Try if you can manually remove the last journal file (corrupted) and see if you can bring the master back

Here are the investigations I guess I can still do, based on your description and the log you pasted:

  1. Guess what corrupted state the journal was, based on the exception
  2. Guess what state the journal could be, when the OOM happens
  3. Maybe we could have a tool that you recover corrupted journal?

A few things based on my previous recommendation below, so the next time we can have dig deeper:

  1. Have HeapDumpOnOutOfMemoryError in your jvm option
  2. Output and keep gc logs
  3. Before formatting, backup the journals and pls share more information with us, that'd be very helpful
Trevor-zhang commented 3 years ago

It seems that useful information cannot be found in edix.txt. I will make some adjustments in the jvm option and output the gc log .

Thank you very much for your help in these two days. @jiacheliu3