Closed Trevor-zhang closed 3 years ago
The UFS of our alluxio is HDFS. On Friday, the cluster was expanded horizontally, and data balance was performed after the expansion. At 0:20 on Saturday, we found alluxio hung up.
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:20:04,843 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:20:04,843 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
2020-12-19 00:20:24,759 WARN SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:24,759 WARN SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,631 INFO BackupLeaderRole - Closing backup-leader role.
2020-12-19 00:27:43,640 INFO AlluxioMasterProcess - Alluxio master ended (lost leadership)
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:28:53,212 INFO BlockWorkerClientPool - Block worker client for GrpcServerAddress{HostName=xx-gray-alhzgz-ecs-dtpbu-data-lake-12, SocketAddress=xx-gray-alhzgz-ecs-dtpbu-data-lake-12/xx.0.74.57:29999} closed.
2020-12-19 00:28:53,222 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 3 attempts
at alluxio.AbstractClient.connect(AbstractClient.java:274)
at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=99d6f5cf-15cc-4d25-b8a8-7de0a0a1ceb2}
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
at alluxio.AbstractClient.connect(AbstractClient.java:227)
... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
... 11 more
Later, we hope to pull up the alluxio service. We executed alluxio-start.sh master
.
But it was wrong:
--- [ FAILED ] The master @ prod-xx-data-lake-002 is not serving requests.
--- Printing the log tail for /opt/alluxio/logs/master.log
>>> BEGIN
2020-12-19 15:17:52,430 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,437 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,440 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:17:53,107 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 15:17:53,109 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:17:53,109 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:18:04,986 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:18:23,541 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO CopycatServer - Server started successfully!
2020-12-19 15:18:23,542 INFO RaftJournalSystem - Started Raft Journal System in 48781ms
2020-12-19 15:18:23,543 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 15:18:23,544 INFO AlluxioMasterProcess - All masters started
2020-12-19 15:18:23,544 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:18:23,544 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:18:23,548 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:18:23,552 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
<<< EOF
This is the master.log information from the occurrence of the exception to the present.
2020-12-19 00:18:09,881 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:18:09,881 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:18:09,881 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:18:09,881 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:18:15,073 WARN SleepingTimer - Master Active UFS Sync last execution took 39523 ms. Longer than the interval 30000
2020-12-19 00:18:15,073 WARN SleepingTimer - Master Active UFS Sync last execution took 39523 ms. Longer than the interval 30000
2020-12-19 00:18:46,082 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:19:31,497 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837335359BE
[HostId]: null
2020-12-19 00:19:31,574 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733F959BE
[HostId]: null
2020-12-19 00:19:31,597 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837332A5ABE
[HostId]: null
2020-12-19 00:19:31,601 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733305ABE
[HostId]: null
2020-12-19 00:19:31,605 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764862-8223680_20201219001650.parquet does not exist
2020-12-19 00:19:31,6xx INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B319837333B5ABE
[HostId]: null
2020-12-19 00:19:31,614 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6931B31983733465ABE
[HostId]: null
2020-12-19 00:19:40,189 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764580-8222860_20201219001551.parquet does not exist
2020-12-19 00:19:40,197 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733BFAFBE
[HostId]: null
2020-12-19 00:19:40,200 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733D8AFBE
[HostId]: null
2020-12-19 00:19:40,205 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764780-8223440_20201219001634.parquet does not exist
2020-12-19 00:19:40,209 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733F3AFBE
[HostId]: null
2020-12-19 00:19:40,216 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733FBAFBE
[HostId]: null
2020-12-19 00:19:40,221 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764620-8222976_20201219001600.parquet does not exist
2020-12-19 00:19:40,225 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837331EB0BE
[HostId]: null
2020-12-19 00:19:40,232 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373329B0BE
[HostId]: null
2020-12-19 00:19:40,236 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764540-8222744_20201219001544.parquet does not exist
2020-12-19 00:19:40,242 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373345B0BE
[HostId]: null
2020-12-19 00:19:40,247 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373351B0BE
[HostId]: null
2020-12-19 00:19:40,254 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764740-8223324_20201219001627.parquet does not exist
2020-12-19 00:19:40,258 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373377B0BE
[HostId]: null
2020-12-19 00:19:40,262 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373382B0BE
[HostId]: null
2020-12-19 00:19:40,266 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764700-8223208_20201219001620.parquet does not exist
2020-12-19 00:19:40,270 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837338EB0BE
[HostId]: null
2020-12-19 00:19:40,273 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373393B0BE
[HostId]: null
2020-12-19 00:19:40,278 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764902-8223796_20201219001657.parquet does not exist
2020-12-19 00:19:40,282 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733A5B0BE
[HostId]: null
2020-12-19 00:19:40,288 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733ABB0BE
[HostId]: null
2020-12-19 00:19:40,292 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764660-8223092_20201219001611.parquet does not exist
2020-12-19 00:19:40,298 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733C9B0BE
[HostId]: null
2020-12-19 00:19:40,302 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733D8B0BE
[HostId]: null
2020-12-19 00:19:40,306 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764822-8223564_20201219001642.parquet does not exist
2020-12-19 00:19:40,311 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733EBB0BE
[HostId]: null
2020-12-19 00:19:40,315 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B31983733F6B0BE
[HostId]: null
2020-12-19 00:19:40,321 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2764942-8223912_20201219001717.parquet does not exist
2020-12-19 00:19:40,326 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B3198373318B1BE
[HostId]: null
2020-12-19 00:19:40,333 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD69C1B319837332DB1BE
[HostId]: null
2020-12-19 00:20:04,843 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:20:04,843 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:20:04,843 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:20:24,759 WARN SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:24,759 WARN SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:20:39,769 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733BF7AC0
[HostId]: null
2020-12-19 00:20:39,796 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733FE7AC0
[HostId]: null
2020-12-19 00:20:39,800 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837330B7BC0
[HostId]: null
2020-12-19 00:20:39,805 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765062-8224260_20201219001752.parquet does not exist
2020-12-19 00:20:39,809 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837331C7BC0
[HostId]: null
2020-12-19 00:20:39,813 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733237BC0
[HostId]: null
2020-12-19 00:20:39,817 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765226-8224746_20201219001822.parquet does not exist
2020-12-19 00:20:39,822 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837332F7BC0
[HostId]: null
2020-12-19 00:20:39,825 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733367BC0
[HostId]: null
2020-12-19 00:20:39,830 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765186-8224630_20201219001814.parquet does not exist
2020-12-19 00:20:39,834 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837334C7BC0
[HostId]: null
2020-12-19 00:20:39,838 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733597BC0
[HostId]: null
2020-12-19 00:20:39,842 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765xx2-8224376_20201219001800.parquet does not exist
2020-12-19 00:20:39,846 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837336B7BC0
[HostId]: null
2020-12-19 00:20:39,850 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B31983733777BC0
[HostId]: null
2020-12-19 00:20:39,854 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765146-8224514_20201219001807.parquet does not exist
2020-12-19 00:20:39,858 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837338A7BC0
[HostId]: null
2020-12-19 00:20:39,866 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731BF5861
[HostId]: null
2020-12-19 00:20:39,867 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D71B319837339B7BC0
[HostId]: null
2020-12-19 00:20:39,870 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731C85861
[HostId]: null
2020-12-19 00:20:39,874 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD6D72C2AE23731D05861
[HostId]: null
2020-12-19 00:21:13,478 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:21:13,479 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:21:13,478 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:21:24,173 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7041B3198373329CAC1
[HostId]: null
2020-12-19 00:21:38,388 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733D63CC2
[HostId]: null
2020-12-19 00:21:38,393 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733DF3CC2
[HostId]: null
2020-12-19 00:21:38,398 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765348-8225xx2_20201219001844.parquet does not exist
2020-12-19 00:21:38,402 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733F93CC2
[HostId]: null
2020-12-19 00:21:38,408 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733FD3CC2
[HostId]: null
2020-12-19 00:21:38,412 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765308-8224986_20201219001837.parquet does not exist
2020-12-19 00:21:38,416 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837331C3DC2
[HostId]: null
2020-12-19 00:21:38,420 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733253DC2
[HostId]: null
2020-12-19 00:21:38,425 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765388-8225218_20201219001905.parquet does not exist
2020-12-19 00:21:38,429 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837333C3DC2
[HostId]: null
2020-12-19 00:21:38,434 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733433DC2
[HostId]: null
2020-12-19 00:21:38,438 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765428-8225334_20201219001925.parquet does not exist
2020-12-19 00:21:38,442 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733593DC2
[HostId]: null
2020-12-19 00:21:38,446 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733623DC2
[HostId]: null
2020-12-19 00:21:38,450 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/business/gb/ads_gb_ord_arrive_ds/2020/12/19/2537902b-f38e-4c9f-90cc-d7c13d39f949-0_0-2765266-8224862_20201219001830.parquet does not exist
2020-12-19 00:21:38,453 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B319837337B3DC2
[HostId]: null
2020-12-19 00:21:38,455 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731C81463
[HostId]: null
2020-12-19 00:21:38,458 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD71284C2A9313542AD5E
[HostId]: null
2020-12-19 00:21:38,462 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD71284C2A931354AAD5E
[HostId]: null
2020-12-19 00:21:38,463 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733913DC2
[HostId]: null
2020-12-19 00:21:38,463 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731CE1463
[HostId]: null
2020-12-19 00:21:38,473 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE23731F31463
[HostId]: null
2020-12-19 00:21:38,477 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7122C2AE2373xx01563
[HostId]: null
2020-12-19 00:21:38,483 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733AB3DC2
[HostId]: null
2020-12-19 00:21:38,493 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD7121B31983733CC3DC2
[HostId]: null
2020-12-19 00:22:37,520 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B319837333AE6C3
[HostId]: null
2020-12-19 00:22:37,527 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B3198373343E6C3
[HostId]: null
2020-12-19 00:22:37,531 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D1B3198373354E6C3
[HostId]: null
2020-12-19 00:22:37,535 WARN ObjectUnderFileSystem - Error fetching directory status, assuming directory oss://xx-data-lake-std-prod/data_lake/full/xx_biz_operation_t_city/20201xx5/2021/03/18 does not exist
2020-12-19 00:22:37,538 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D84C2A93135AF6560
[HostId]: null
2020-12-19 00:22:37,623 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD74D84C2A93135736660
[HostId]: null
2020-12-19 00:22:42,127 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD75284C2A93135D88760
[HostId]: null
2020-12-19 00:22:42,201 INFO oss - [Server]Unable to execute HTTP request: Not Found
[ErrorCode]: NoSuchKey
[RequestId]: 5FDCD75284C2A93135808860
[HostId]: null
2020-12-19 00:22:56,741 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:22:56,741 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:22:56,741 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:23:31,317 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:23:31,317 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:23:31,317 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:05,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:05,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:05,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:35,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:35,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:24:35,448 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:15,268 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:15,268 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:15,268 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:45,621 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:45,622 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:25:45,622 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:19,131 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:19,131 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:19,131 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:51,783 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:51,783 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:26:51,783 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:27:35,630 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:27:35,630 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:27:35,630 WARN MasterJournalContext - Journal flush failed. retrying...
java.io.IOException: Timed out after waiting 30000 milliseconds for journal entries to be processed
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:xx2)
at alluxio.master.journal.AsyncJournalWriter.doFlush(AsyncJournalWriter.java:295)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.atomix.catalyst.concurrent.BlockingFuture.get(BlockingFuture.java:54)
at alluxio.master.journal.raft.RaftJournalWriter.flush(RaftJournalWriter.java:93)
... 2 more
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
2020-12-19 00:27:43,631 INFO BackupLeaderRole - Closing backup-leader role.
2020-12-19 00:27:43,640 INFO AlluxioMasterProcess - Alluxio master ended (lost leadership)
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:28:53,212 INFO BlockWorkerClientPool - Block worker client for GrpcServerAddress{HostName=xx-gray-alhzgz-ecs-dtpbu-data-lake-12, SocketAddress=xx-gray-alhzgz-ecs-dtpbu-data-lake-12/xx.0.74.57:29999} closed.
2020-12-19 00:28:53,222 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 3 attempts
at alluxio.AbstractClient.connect(AbstractClient.java:274)
at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=99d6f5cf-15cc-4d25-b8a8-7de0a0a1ceb2}
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
at alluxio.AbstractClient.connect(AbstractClient.java:227)
... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
... 11 more
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 3 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: prod-xx-data-lake-002/xx.0.74.34:19998
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:xx44)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:30:03,609 INFO ServerConnector - Stopped ServerConnector@49c992f4{HTTP/1.1}{xx.0.74.34:19999}
2020-12-19 00:30:03,616 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@61b4889b{/,file:/opt/alluxio/webui/master/build/,UNAVAILABLE}
2020-12-19 00:30:03,617 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@27998699{/metrics/json,null,UNAVAILABLE}
2020-12-19 00:30:03,618 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@7bd638d4{/metrics/prometheus,null,UNAVAILABLE}
2020-12-19 00:30:03,620 INFO AbstractMaster - TableMaster: Stopping primary master.
2020-12-19 00:30:03,620 INFO AbstractMaster - TableMaster: Stopped primary master.
2020-12-19 00:30:03,620 INFO AbstractMaster - MetaMaster: Stopping primary master.
2020-12-19 00:30:03,620 INFO AbstractMaster - MetaMaster: Stopped primary master.
2020-12-19 00:30:14,706 INFO AbstractMaster - FileSystemMaster: Stopping primary master.
2020-12-19 00:30:14,708 INFO AbstractMaster - FileSystemMaster: Stopped primary master.
2020-12-19 00:30:14,708 INFO AbstractMaster - BlockMaster: Stopping primary master.
2020-12-19 00:30:14,708 INFO AbstractMaster - BlockMaster: Stopped primary master.
2020-12-19 00:30:14,708 INFO AbstractMaster - MetricsMaster: Stopping primary master.
2020-12-19 00:30:14,7xx INFO AbstractMaster - MetricsMaster: Stopped primary master.
2020-12-19 00:30:29,661 ERROR ConfigHashSync - Failed to heartbeat to meta master to get configuration hash:
alluxio.exception.status.UnavailableException: Failed to connect to MetaMaster @ prod-xx-data-lake-002:19998 after 9 attempts
at alluxio.AbstractClient.connect(AbstractClient.java:274)
at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:385)
at alluxio.AbstractClient.retryRPC(AbstractClient.java:349)
at alluxio.client.meta.RetryHandlingMetaMasterConfigClient.getConfigHash(RetryHandlingMetaMasterConfigClient.java:82)
at alluxio.client.file.ConfigHashSync.heartbeat(ConfigHashSync.java:82)
at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:119)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: alluxio.exception.status.UnavailableException: Target Unavailable. GrpcChannelKey{ClientType=MetaMaster, ClientHostname=prod-xx-data-lake-002, ServerAddress=GrpcServerAddress{HostName=prod-xx-data-lake-002, SocketAddress=prod-xx-data-lake-002:19998}, ChannelId=1aa4c8ac-a651-41b7-8262-bdbe53d30xxb}
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:226)
at alluxio.AbstractClient.connect(AbstractClient.java:227)
... xx more
Caused by: alluxio.exception.status.UnavailableException: io exception
at alluxio.exception.status.AlluxioStatusException.from(AlluxioStatusException.java:155)
at alluxio.exception.status.AlluxioStatusException.fromStatusRuntimeException(AlluxioStatusException.java:214)
at alluxio.exception.status.AlluxioStatusException.fromThrowable(AlluxioStatusException.java:199)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.waitUntilChannelAuthenticated(AuthenticatedChannelClientDriver.java:187)
at alluxio.security.authentication.AuthenticatedChannelClientDriver.startAuthenticatedChannel(AuthenticatedChannelClientDriver.java:164)
at alluxio.security.authentication.ChannelAuthenticator.authenticate(ChannelAuthenticator.java:xx5)
at alluxio.grpc.GrpcChannelBuilder.build(GrpcChannelBuilder.java:212)
... 11 more
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:449)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 3 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: prod-xx-data-lake-002/xx.0.74.34:19998
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:660)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:637)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:473)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:xx44)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2020-12-19 00:30:29,662 WARN SleepingTimer - Meta Master Config Hash Sync-app-8362670081503149240 last execution took 128666 ms. Longer than the interval 60000
2020-12-19 00:30:30,652 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 00:30:30,652 INFO RaftJournalWriter - Closing journal writer. Last sequence numbers written/submitted/committed: 9796730/9796730/9796730
2020-12-19 00:30:38,742 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 00:30:38,745 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 00:30:38,745 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 00:30:38,892 INFO RaftJournalSystem - Journal shutdown complete
2020-12-19 13:12:24,799 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 13:12:24,799 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 13:12:24,799 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 13:12:24,799 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 13:12:24,799 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 13:12:24,852 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:12:24,856 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-1383991486092514144. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:12:24,920 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 13:12:24,923 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:12:24,954 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:12:24,985 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:12:24,986 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:12:25,000 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:12:25,001 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:12:25,111 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:12:25,709 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:12:25,719 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:12:25,902 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:12:25,902 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:12:25,911 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 13:12:34,646 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:12:34,647 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:12:34,794 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:12:34,796 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:12:34,804 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:12:35,059 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:12:51,876 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:12:51,878 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:12:51,886 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:12:51,889 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:12:51,952 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:12:51,960 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,977 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,977 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,982 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO JournalUtils - Reading journal entries
2020-12-19 13:12:51,983 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:12:52,720 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 13:12:52,723 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:12:52,723 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:13:05,078 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:13:22,596 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:13:22,597 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:13:22,597 INFO CopycatServer - Server started successfully!
2020-12-19 13:13:22,597 INFO RaftJournalSystem - Started Raft Journal System in 47950ms
2020-12-19 13:13:22,598 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 13:13:22,600 INFO AlluxioMasterProcess - All masters started
2020-12-19 13:13:22,600 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:13:22,600 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:13:22,604 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:13:22,608 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:35,025 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 13:33:35,025 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 13:33:35,025 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 13:33:35,025 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 13:33:35,025 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 13:33:35,080 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:33:35,084 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-546653125117181xx91. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:33:35,134 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 13:33:35,151 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:33:35,182 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:33:35,195 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:33:35,196 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:33:35,209 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:33:35,211 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:33:35,346 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:33:35,894 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:33:35,903 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:33:36,077 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:33:36,077 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:33:36,084 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 13:33:44,994 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:44,995 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:33:45,136 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:33:45,137 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:33:45,145 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:33:45,290 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:33:55,200 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:33:55,202 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:33:55,215 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:33:55,218 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:33:55,282 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:33:55,290 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,308 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,308 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,313 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,314 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO JournalUtils - Reading journal entries
2020-12-19 13:33:55,315 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:33:55,954 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 13:33:55,956 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:33:55,956 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:33:55,957 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:33:55,958 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:33:55,958 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:34:22,262 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:34:24,544 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:34:24,545 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:34:24,545 INFO CopycatServer - Server started successfully!
2020-12-19 13:34:24,545 INFO RaftJournalSystem - Started Raft Journal System in 39550ms
2020-12-19 13:34:24,546 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 13:34:24,547 INFO AlluxioMasterProcess - All masters started
2020-12-19 13:34:24,548 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:34:24,548 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:34:24,551 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:34:24,555 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:36,297 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 13:43:36,297 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 13:43:36,297 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 13:43:36,297 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 13:43:36,297 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 13:43:36,347 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 13:43:36,351 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-835096659429914040. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 13:43:36,405 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 13:43:36,422 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 13:43:36,451 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 13:43:36,465 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 13:43:36,466 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 13:43:36,483 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 13:43:36,485 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 13:43:36,606 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 13:43:37,173 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 13:43:37,182 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 13:43:37,350 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:43:37,350 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 13:43:37,357 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 13:43:46,475 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:46,476 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 13:43:46,572 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 13:43:46,573 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 13:43:46,586 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 13:43:46,624 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:44:06,224 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 13:44:06,226 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 13:44:06,236 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 13:44:06,239 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 13:44:06,300 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 13:44:06,308 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,325 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,325 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,330 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO JournalUtils - Reading journal entries
2020-12-19 13:44:06,332 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 13:44:07,078 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 13:44:07,084 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 13:44:07,084 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 13:44:18,996 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 13:44:35,xx0 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:44:35,xx0 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 13:44:35,xx0 INFO CopycatServer - Server started successfully!
2020-12-19 13:44:35,xx0 INFO RaftJournalSystem - Started Raft Journal System in 48624ms
2020-12-19 13:44:35,xx1 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 13:44:35,xx3 INFO AlluxioMasterProcess - All masters started
2020-12-19 13:44:35,xx3 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 13:44:35,xx3 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 13:44:35,xx7 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 13:44:35,1xx INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:15,055 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 14:57:15,055 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 14:57:15,055 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 14:57:15,055 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 14:57:15,055 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 14:57:15,xx4 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 14:57:15,xx7 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-4583728940563134304. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 14:57:15,159 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 14:57:15,184 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 14:57:15,216 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 14:57:15,217 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 14:57:15,217 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 14:57:15,231 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 14:57:15,232 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 14:57:15,384 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 14:57:15,914 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 14:57:15,923 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 14:57:16,057 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 14:57:16,058 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 14:57:16,064 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 14:57:25,038 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:25,039 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 14:57:25,180 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 14:57:25,181 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 14:57:25,189 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 14:57:25,336 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 14:57:39,475 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 14:57:39,478 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 14:57:39,485 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 14:57:39,489 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 14:57:39,554 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 14:57:39,561 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,578 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,578 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,584 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,585 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO JournalUtils - Reading journal entries
2020-12-19 14:57:39,586 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 14:57:40,284 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 14:57:40,290 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 14:57:40,291 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 14:57:55,358 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 14:58:12,431 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 14:58:12,431 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 14:58:12,432 INFO CopycatServer - Server started successfully!
2020-12-19 14:58:12,432 INFO RaftJournalSystem - Started Raft Journal System in 47393ms
2020-12-19 14:58:12,433 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 14:58:12,434 INFO AlluxioMasterProcess - All masters started
2020-12-19 14:58:12,434 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 14:58:12,434 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 14:58:12,438 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 14:58:12,442 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:21,482 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 15:06:21,482 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 15:06:21,482 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 15:06:21,482 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 15:06:21,482 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 15:06:21,535 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 15:06:21,539 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-3184643968898944169. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 15:06:21,592 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 15:06:21,607 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 15:06:21,637 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 15:06:21,653 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 15:06:21,654 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 15:06:21,667 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 15:06:21,669 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 15:06:21,806 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 15:06:22,363 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 15:06:22,372 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 15:06:22,535 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:06:22,535 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 15:06:22,542 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 15:06:31,617 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:31,618 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 15:06:31,727 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 15:06:31,728 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 15:06:31,736 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:06:31,799 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:06:45,799 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 15:06:45,802 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 15:06:45,8xx INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 15:06:45,813 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 15:06:45,875 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 15:06:45,884 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,900 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,900 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,905 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,906 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,907 INFO JournalUtils - Reading journal entries
2020-12-19 15:06:45,907 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:06:46,598 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 15:06:46,600 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:06:46,601 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:06:46,602 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:06:46,603 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:06:46,604 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:07:01,819 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:07:17,275 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:07:17,276 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:07:17,276 INFO CopycatServer - Server started successfully!
2020-12-19 15:07:17,276 INFO RaftJournalSystem - Started Raft Journal System in 45658ms
2020-12-19 15:07:17,277 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 15:07:17,279 INFO AlluxioMasterProcess - All masters started
2020-12-19 15:07:17,279 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:07:17,279 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:07:17,282 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:07:17,288 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:24,574 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-12-19 15:17:24,574 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-12-19 15:17:24,574 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-12-19 15:17:24,574 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-12-19 15:17:24,574 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-12-19 15:17:24,629 INFO MetricsSystem - Starting sinks with config: {}.
2020-12-19 15:17:24,633 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-7973828500878036450. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2020-12-19 15:17:24,682 INFO NettyUtils - EPOLL_MODE is available
2020-12-19 15:17:24,700 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio/lib
2020-12-19 15:17:24,730 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio//extensions
2020-12-19 15:17:24,740 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=prod-xx-data-lake-002, rack=null)
2020-12-19 15:17:24,740 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio/lib
2020-12-19 15:17:24,753 INFO UnderDatabaseRegistry - Registered UDBs: hive
2020-12-19 15:17:24,755 INFO LayoutRegistry - Registered Table Layouts: hive
2020-12-19 15:17:24,892 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2020-12-19 15:17:25,430 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-12-19 15:17:25,439 INFO HdfsUnderFileSystem - Successfully instantiated SupportedHdfsActiveSyncProvider
2020-12-19 15:17:25,576 INFO ProcessUtils - Starting Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:17:25,576 INFO RaftJournalSystem - Initializing Raft Journal System
2020-12-19 15:17:25,584 INFO JournalStateMachine - Initialized new journal state machine
2020-12-19 15:17:34,760 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:34,761 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [prod-xx-data-lake-002:19200]. Local address: prod-xx-data-lake-002:19200
2020-12-19 15:17:34,859 INFO GrpcMessagingServer - Successfully started messaging server at: prod-xx-data-lake-002:19200
2020-12-19 15:17:34,863 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to FOLLOWER
2020-12-19 15:17:34,875 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-12-19 15:17:34,909 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:17:52,318 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to CANDIDATE
2020-12-19 15:17:52,321 INFO CandidateState - prod-xx-data-lake-002:19200 - Starting election
2020-12-19 15:17:52,332 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to LEADER
2020-12-19 15:17:52,335 INFO ServerContext - prod-xx-data-lake-002:19200 - Found leader prod-xx-data-lake-002:19200
2020-12-19 15:17:52,399 INFO ServerStateMachine - prod-xx-data-lake-002:19200 - Installing snapshot 1
2020-12-19 15:17:52,409 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,430 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,437 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,439 INFO JournalUtils - Reading journal entries
2020-12-19 15:17:52,440 INFO JournalStateMachine - Successfully installed snapshot up to SN -1
2020-12-19 15:17:53,xx7 ERROR JournalStateMachine - Fatal error: Unexpected journal entry. The next expected SN is 0, but encountered an entry with SN 36870. Full journal entry: sequence_number: 36870
update_inode_directory {
id: 0
direct_children_loaded: true
}
2020-12-19 15:17:53,xx9 INFO AbstractMaster - TableMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO AbstractMaster - MetaMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO AbstractMaster - FileSystemMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO AbstractMaster - BlockMaster: Stopped secondary master.
2020-12-19 15:17:53,xx9 INFO AbstractMaster - MetricsMaster: Stopped secondary master.
2020-12-19 15:18:04,986 ERROR ClientMasterSync - Failed to get master address during initialization: alluxio.exception.status.UnavailableException: Failed to handshake with master prod-xx-data-lake-002:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2020-12-19 15:18:23,541 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY
2020-12-19 15:18:23,542 INFO CopycatServer - Server started successfully!
2020-12-19 15:18:23,542 INFO RaftJournalSystem - Started Raft Journal System in 48781ms
2020-12-19 15:18:23,543 INFO RaftJournalSystem - Shutting down raft journal
2020-12-19 15:18:23,544 INFO AlluxioMasterProcess - All masters started
2020-12-19 15:18:23,544 INFO FaultTolerantAlluxioMasterProcess - Secondary started
2020-12-19 15:18:23,544 INFO ProcessUtils - Stopping Alluxio master @prod-xx-data-lake-002:19998.
2020-12-19 15:18:23,548 INFO ServerContext - prod-xx-data-lake-002:19200 - Transitioning to INACTIVE
2020-12-19 15:18:23,552 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
@JySongWithZhangCe @jiacheliu3 Can you help me find out what is the cause of the problem? Is it possible that Alluxio is abnormal due to HDFS balancing data?
@Trevor-zhang You have attached much logs, I only made a quick pass so might be missing something.
Seems Alluxio is running with Active Sync, where updates in the HDFS will be streamed into Alluxio. You have a lot of operations going on in HDFS, and I expect a lot of messages to be streaming into Alluxio. Then somehow this affected journal flushing. Then you see an OOM.
2020-12-19 00:20:24,759 WARN SleepingTimer - Master Active UFS Sync last execution took 34411 ms. Longer than the interval 30000
2020-12-19 00:27:43,626 ERROR MasterJournalContext - Fatal error: Journal flush failed after 9 attempts
...
2020-12-19 00:28:20,996 ERROR SingleThreadContext - An uncaught exception occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.ConcurrentSkipListMap.entryIterator(ConcurrentSkipListMap.java:2376)
at java.util.concurrent.ConcurrentSkipListMap$EntrySet.iterator(ConcurrentSkipListMap.java:2522)
at alluxio.master.transport.GrpcMessagingConnection.timeoutPendingRequests(GrpcMessagingConnection.java:489)
at alluxio.master.transport.GrpcMessagingConnection$$Lambda$196/1878270761.run(Unknown Source)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
at io.atomix.catalyst.concurrent.Runnables$$Lambda$96/1541857308.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
@Trevor-zhang I would start with looking at the existing journal, ls -alR ${ALLUXIO_HOME}/journal
and see if the journal files still make sense.
Then translate the journal entries into human readable format bin/alluxio readJournal -start <start index> -end <end index>
. This parses the journal entries in the range into a txt file containing what happened. If you identify what was the journal entry that failed to flush, now you can see what that was doing.
Let's see what you find
@Trevor-zhang There are multiple things we can do from here:
XX:+HeapDumpOnOutOfMemoryError
, then you can analyze what consumed most memory from the dumpalluxio readJournal
fails. I guess you can just throw away that file and rename the last journal file. For example if you have
./0x001-0x100
./0x100-0xfff <- you find this is corrupted
Then I believe you can
./0x001-0xfff <- rename this file
./0x100-0xfff <- delete this
The master should be able to start with what it has, then the raft quorum will update it with what it is missing. @LuQQiu could you confirm if my guess is right?
@jiacheliu3 The metadata from the journal directory up to before the crash exists. In edit.txt
I found a lot of pipeline information synchronized with OSS. But I still haven’t found the cause of Alluxio crashed.
I am curious why there is no way to restart the service except format journal after the crash, even if the whole environment is stable when it is started.
@Trevor-zhang Could you share your investigation on the journal files with us? What are the last edits look like and the journal files too?
I'm not sure what you mean by "even if the whole environment is stable when it started"? If there's OOM, it's hard to guarantee anything, a lot of behaviors will be undefined. Unfortunately, based on the information you provided I'm not able to better answer this question. If you share more information maybe I can help more.
If the journal looks corrupted, formatting journal will be the easiest way and fastest way to bring your service back up. Alternatively, you can try the manual way I suggested as well. When there is journal, Alluxio recovers from the journal. It does not matter whether your environment (UFS) is stable or not. And this does not work when your journal is corrupted, of course.
@jiacheliu3 Thank you for your help. Because this is the production environment, the journal log is on the computer of the operation and maintenance personnel , and I don't expect to get the file until Monday.
I read journal yesterday, the file are all in good condition, and I have recovered alluxio. I want to locate the cause of this failure and sum up the lessons, so as not to have such a problem again next time. :) :)
@Trevor-zhang I have the same opinion. We should investigate on the journal files and figure out the issue, so we understand better what to do next time.
Please share the file and how the journal files look like (what are the file names? how big are they?) with us. It's going to be very helpful. Thanks!
@jiacheliu3 i run /alluxio readJournal -start 2000000 -end 8000000
. The following files are generated, and there are about 1.7G files in edits.txt.
In edits.txt
with the following file:
sequence_number: 7999996
delete_file {
id: 23631480487935
recursive: false
op_time_ms: 1607450451368
path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209014700.clean"
}
sequence_number: 7999997
delete_file {
id: 23631564374015
recursive: false
op_time_ms: 1607450451380
path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209014700.inflight"
}
sequence_number: 7999998
delete_file {
id: 23633292427263
recursive: false
op_time_ms: 1607450451391
path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209015024.clean"
}
sequence_number: 7999999
delete_file {
id: 23633577639935
recursive: false
op_time_ms: 1607450451403
path: "/oss/gb2/ads_gb_ord_arrive_ds/.hoodie/20201209015043.inflight"
}
Because the file is too large, how can I filter the content based on certain conditions ?
@Trevor-zhang Can you attach what you see doing ls -alR ${ALLUXIO_HOME}/journal
? That should help us understand what journal entries were in the last journal file, which probably failed flushing. I think you only need to check the entries involved in the last file, and most probably only the last few entries were affected.
Also if you have other masters, I recommend you to check the same entries on the other masters too. You will be able to see what entries were missing on the standby masters.
@jiacheliu3 Sorry, I just learned from O&M that the last journal file is no longer found. The O&M executed format journal
. Now there is only the edit.txt file of that period. Can we start with edit.txt
?
@Trevor-zhang I assume you checked the index on the journal files and found the last entries to be 8000000. From the attached piece of your edit.txt
, I can see the last entries are delete_file operations. This operation should be small and fast, I don't expect them to cause you journal issues. From that perspective, I don't think the edit.txt
have any more information to us.
Because there's no journal anymore, we can no longer check the following:
edit.txt
Here are the investigations I guess I can still do, based on your description and the log you pasted:
A few things based on my previous recommendation below, so the next time we can have dig deeper:
HeapDumpOnOutOfMemoryError
in your jvm optionIt seems that useful information cannot be found in edix.txt.
I will make some adjustments in the jvm option and output the gc log .
Thank you very much for your help in these two days. @jiacheliu3
Alluxio Version: 2.2.0 Describe the bug Due to the expansion of the hadoop cluster and then performing hdfs data balancing, alluxio was down and could not be started To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)
Expected behavior A clear and concise description of what you expected to happen.
Urgency Describe the impact and urgency of the bug.
Additional context Add any other context about the problem here.