Open mjzfrank opened 2 years ago
It doesn't seem to be a bug on Alluxio's side. Some possible fixes:
hadoop001
is a custom hostname, make sure worker's machine recognizes this hostname(by configuring proper mappings in /etc/hosts on CentOS, for example), or try to configure Alluxio with explicit IP addresses.@YangchenYe323 @mjzfrank I had the same problem running alluxio cluster 2.8.0 (the most basic configuration). Excuse me, how should this be solved? I have changed the master hostname many times, but the problem still exists.
Some ports are not open, which leads to my problem. After opening the required ports, the problem is resolved. thank you
Oh well - which ports did you open? @LiuJiahao0001 ? Also, ports on master / worker / pod / node / container ... Sharing some useful info will help others
I have the same issue when i deploy on local mode. the master is OK,but the worker start faild. the error is the line 1 in the worker.log. And I can access the webui on port 30000(worker) and 19999(master) And I can use cmd like ‘./bin/alluxio fs ls /’ But I can not use cmd ‘./bin/alluxio fs cat /xxx.txt’
2022-10-20 10:58:58,395 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 8): alluxio.exception.status.UnavailableException: Failed to handshake with master localhost:19998 to load cluster default configuration values: UNAVAILABLE: io exception
2022-10-20 10:59:34,835 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=30.46.113.147, rack=null)
2022-10-20 10:59:34,846 INFO BlockWorkerFactory - Creating DefaultBlockWorker
2022-10-20 10:59:34,870 INFO DefaultStorageDir - Folder /Volumes/ramdisk/alluxioworker was created!
2022-10-20 10:59:34,870 INFO DefaultStorageDir - StorageDir initialized: path=/Volumes/ramdisk/alluxioworker, tier=alluxio.worker.block.meta.DefaultStorageTier@5154011c, dirIndex=0, medium=MEM, capacityBytes=1073741824, reservedBytes=0, availableBytes=1073741824
2022-10-20 10:59:34,933 INFO MetricsSystem - Starting sinks with config: {}.
2022-10-20 10:59:34,935 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-6951769813988764773. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2022-10-20 10:59:35,119 INFO log - Logging initialized @55080ms to org.eclipse.jetty.util.log.Slf4jLog
2022-10-20 10:59:35,138 INFO MetricsHeartbeatContext - Created metrics heartbeat with ID app-704029929771442854. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property
2022-10-20 10:59:43,811 INFO NettyUtils - EPOLL is not available, will use NIO
2022-10-20 11:00:05,323 INFO GrpcDataServer - Alluxio worker gRPC server started, listening on /0.0.0.0:29999
2022-10-20 11:00:05,325 INFO ProcessUtils - Starting Alluxio worker @30.46.113.147:29999.
2022-10-20 11:00:05,325 INFO ProcessUtils - Alluxio version: 2.8.1-d8b71d66f26086d8918b0ae40f0da458c8af8b4c
2022-10-20 11:00:05,325 INFO ProcessUtils - Java version: 1.8.0_275
2022-10-20 11:00:05,382 INFO BlockMapIterator - Worker register stream batchSize=1000000
2022-10-20 11:00:05,407 INFO RegisterStreamer - Worker 5506142347260564401 - All requests have been sent. Completing the client side.
2022-10-20 11:00:05,425 INFO RegisterStreamer - Worker 5506142347260564401 - Waiting on the master side to complete
2022-10-20 11:00:05,431 INFO RegisterStreamer - 5506142347260564401 - Complete message received from the server. Closing stream
2022-10-20 11:00:05,432 INFO RegisterStreamer - Worker 5506142347260564401 - Finished registration with a stream
2022-10-20 11:00:05,434 INFO WebServer - Alluxio worker web service starting @ /0.0.0.0:30000
2022-10-20 11:00:05,435 INFO Server - jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 1.8.0_275-b01
2022-10-20 11:00:05,464 INFO ContextHandler - Started o.e.j.s.ServletContextHandler@42f8285e{/metrics/json,null,AVAILABLE}
2022-10-20 11:00:05,464 INFO ContextHandler - Started o.e.j.s.ServletContextHandler@26bab2f1{/metrics/prometheus,null,AVAILABLE}
2022-10-20 11:00:05,466 WARN SecurityHandler - ServletContext@o.e.j.s.ServletContextHandler@7a560583{/,file:///Users/jingyuan/Documents/project/bkmpp-alluxio/webui/worker/build/,STARTING} has uncovered http methods for path: /
2022-10-20 11:00:12,994 INFO ContextHandler - Started o.e.j.s.ServletContextHandler@7a560583{/,file:///Users/jingyuan/Documents/project/bkmpp-alluxio/webui/worker/build/,AVAILABLE}
2022-10-20 11:00:13,002 INFO AbstractConnector - Started ServerConnector@3fce8fd9{HTTP/1.1, (http/1.1)}{0.0.0.0:30000}
2022-10-20 11:00:13,002 INFO Server - Started @92964ms
2022-10-20 11:00:13,002 INFO WebServer - Alluxio worker web service started @ /0.0.0.0:30000
2022-10-20 11:00:13,010 INFO AlluxioWorkerProcess - Alluxio worker started. id=5506142347260564401, bindHost=0.0.0.0, connectHost=30.46.113.147, rpcPort=29999, webPort=30000
2022-10-20 11:13:50,378 INFO GrpcDataServer - Shutting down Alluxio worker gRPC server at 0.0.0.0/0.0.0.0:29999.
2022-10-20 11:13:50,381 INFO AlluxioWorkerProcess - Alluxio worker ended
2022-10-20 11:13:50,381 INFO ProcessUtils - Stopping Alluxio worker @30.46.113.147:29999.
2022-10-20 11:13:54,474 INFO AbstractConnector - Stopped ServerConnector@3fce8fd9{HTTP/1.1, (http/1.1)}{0.0.0.0:30000}
2022-10-20 11:13:54,484 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@7a560583{/,file:///Users/jingyuan/Documents/project/bkmpp-alluxio/webui/worker/build/,STOPPED}
2022-10-20 11:13:54,484 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@26bab2f1{/metrics/prometheus,null,STOPPED}
2022-10-20 11:13:54,484 INFO ContextHandler - Stopped o.e.j.s.ServletContextHandler@42f8285e{/metrics/json,null,STOPPED}
2022-10-20 11:13:54,486 WARN JvmPauseMonitor - JVM pause monitor interrupted during sleep.
@kelvin17 do you fixed the issue? i have the same issue 2.8
how to solve this problem?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
I am also facing the same issue with Alluxio version 2.9.0 Has someone got any solution for the same?
Please note that I have correct entries in "/etc/hosts" as well.
2023-02-11 09:33:56,187 ERROR ClientMasterSync - Failed to get master address during initialization alluxio.exception.status.UnavailableException: Failed to handshake with master zmlssrelbapp02:19998 to load cluster default configuration values: UNIMPLEMENTED: Method not found: alluxio.grpc.meta.MetaMasterConfigurationService/GetConfiguration at alluxio.conf.Configuration.loadConfiguration(Configuration.java:458) at alluxio.ClientContext.loadConf(ClientContext.java:120) at alluxio.client.metrics.ClientMasterSync.loadConf(ClientMasterSync.java:122) at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Method not found: alluxio.grpc.meta.MetaMasterConfigurationService/GetConfiguration at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) at alluxio.grpc.MetaMasterConfigurationServiceGrpc$MetaMasterConfigurationServiceBlockingStub.getConfiguration(MetaMasterConfigurationServiceGrpc.java:424) at alluxio.conf.Configuration.loadConfiguration(Configuration.java:452) ... 10 more
@YangchenYe323 @mjzfrank : if possible, can you please guide?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
I was having the same problem in my docker env My Env
Alluxio version: 2.9.3
Docker version 20.10.21, build baeda1f
Arch: Apple M1 Max
OS: Ventura 13.3.1
log error
2023-05-17 21:30:40,218 INFO [metrics-master-heartbeat-0](NettyUtils.java:146) - EPOLL_MODE is available
2023-05-17 21:30:41,079 ERROR [metrics-master-heartbeat-0](SamplingLogger.java:360) - Failed to get master address during initialization
alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master:19998 to load cluster default configuration values: UNAVAILABLE: io exception
at alluxio.conf.Configuration.loadConfiguration(Configuration.java:458)
at alluxio.ClientContext.loadConf(ClientContext.java:123)
at alluxio.client.metrics.ClientMasterSync.loadConf(ClientMasterSync.java:122)
at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at alluxio.grpc.MetaMasterConfigurationServiceGrpc$MetaMasterConfigurationServiceBlockingStub.getConfiguration(MetaMasterConfigurationServiceGrpc.java:424)
at alluxio.conf.Configuration.loadConfiguration(Configuration.java:452)
... 10 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection reset by peer: alluxio-master/172.21.0.2:19998
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection reset by peer
at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:750)
It looks like because JVM process is taking too long to come up in docker env, causing heartbeat service to fail because master reside in provisioning state and RPC port is closed. I double this property alluxio.user.metrics.heartbeat.interval.ms
to 20sec
and this change did help me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
Alluxio Version: 2.8.0
Describe the bug Master report this error :
To Reproduce ./alluxio-start.sh all
Expected behavior how to fix this
Urgency
Are you planning to fix it No
Additional context here is my properties
./alluxio fs works well and the port 19998 is on , why master.log report this error ?