Closed zc87328000 closed 4 years ago
Hi @zc87328000, can you provide more master.log sections? I think the root cause may be seen higher up in the stack trace.
This message seems a bit puzzling alluxio.exception.status.UnavailableException: Failed to connect to MetaMasterMaster @ 10-43-81-25/10.43.43.25:19998 after 7 attempts
Is 10-43-81-25 a hostname that resolves to IP 10.43.43.25? Can you also confirm whether the other masters in the quorum are up and running or are they also getting similar errors?
Hi @ns1123 The IP(10.43.43.25)of the hostname is 10-43-81-25. master.log
2019-10-31 14:36:42,692 INFO Compatibility - Running in ZooKeeper 3.4.x compatibility mode
2019-10-31 14:36:42,707 INFO CuratorFrameworkImpl - Starting
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:host.name=10-43-81-25
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.version=1.8.0_211
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.vendor=Oracle Corporation
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.home=/data1/jdk/jre
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.class.path=/app/alluxio/conf/::/app/alluxio/assembly/alluxio-server-2.0.0.jar
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.io.tmpdir=/tmp
2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.compiler=
Hi @ns1123 Now I can start all masters and workers,but I can't start job_master If I execute this (alluxio-start.sh job_master) command, I will make such a mistake.
2019-10-31 20:05:20,514 INFO utils.Compatibility (Compatibility.java:
@zc87328000 sorry for the delayed response, have you managed to figure out the issue? I noticed your latest logs have Alluxio using ZK 3.4.x compatibility mode even though you mentioned your ZK quorum is at version 3.5.x.
@zc87328000 - is this still an issue, if so, please re-open the issue.
It is an unhelpful and problematic chain of operations to keep closing multiple threads of complains in a short time with NO resolution and nobody getting alarmed at Alluxio that dozens of issues are filed around the same reason.
Its closing in on 3 years and v2.8.1 right OUT OF THE BOX still shows exact same issue. I am a strong customer and mostly for the poor documentation and a complete lack of help and support, I will move on to some other data orchestration solution. I am a CTO and although I do commend you guys on the fact that you have a live slack channel and all, but Alluxio is still immature when it comes to (1) software quality, (2) documentation, (3) tech support. Slacking is not how even open-source software should engage for troubleshooting, let alone a paid product like yours.
No response is going to be read/ entertained. Only support is.
@rastogiasr @alluxio-bot ...
a complete lack of help and support, I will move on to some other data orchestration solution. I am a CTO and although I do commend you guys on the fact that you have a live slack channel and all, but Alluxio is still immature when it comes to (1) software quality, (2) documentation, (3) tech support. Slacking is not how even open-source software should engage for troubleshooting, let alone a paid product like y
yes, I meet the same problem on this, but don't get any support.... Do you have find the other data orchestration solution? @np-ftrwei
Alluxio Version: Alluxio2.0.0 zookeeper-3.5.5
Describe the bug job_master.log Opening socket connection to server 10.43.81.25/10.43.81.25:2181. Will not attempt to authenticate using SASL (unknown error)
Session establishment complete on server 10.43.81.25/10.43.81.25:2181, sessionid = 0x10014a98bdd000c, negotiated timeout = 40000
worker.log 2019-10-30 09:50:34,564 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 1): alluxio.exception.status.UnavailableException: Failed to handshake with master 10-43-81-25/10.43.43.25:19998 to load cluster default configuration values: UNAVAILABLE: Network closed for unknown reason
master.log 2019-10-30 11:21:56,910 ERROR MetaMasterSync - Failed to receive leader master heartbeat command. alluxio.exception.status.UnavailableException: Failed to connect to MetaMasterMaster @ 10-43-81-25/10.43.43.25:19998 after 7 attempts at alluxio.AbstractClient.connect(AbstractClient.java:264) at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:367) at alluxio.AbstractClient.retryRPC(AbstractClient.java:331) at alluxio.master.meta.RetryHandlingMetaMasterMasterClient.getId(RetryHandlingMetaMasterMasterClient.java:76) at alluxio.master.meta.MetaMasterSync.setIdAndRegister(MetaMasterSync.java:115) at alluxio.master.meta.MetaMasterSync.heartbeat(MetaMasterSync.java:71) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: alluxio.exception.status.UnavailableException: Failed to handshake with master 10-43-81-25/10.43.43.25:19998 to load cluster default configuration values: UNAVAILABLE: Network closed for unknown reason at alluxio.util.ConfigurationUtils.loadConfiguration(ConfigurationUtils.java:490) at alluxio.ClientContext.loadConf(ClientContext.java:129) at alluxio.ClientContext.loadConfIfNotLoaded(ClientContext.java:150) at alluxio.AbstractClient.beforeConnect(AbstractClient.java:166) at alluxio.AbstractClient.connect(AbstractClient.java:224) ... 11 more Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) at alluxio.grpc.MetaMasterConfigurationServiceGrpc$MetaMasterConfigurationServiceBlockingStub.getConfiguration(MetaMasterConfigurationServiceGrpc.java:375) at alluxio.util.ConfigurationUtils.loadConfiguration(ConfigurationUtils.java:484) ... 15 more
To Reproduce alluxio-site.properties
alluxio.master.hostname=10.43.43.25
Worker properties
alluxio.worker.memory.size=1024GB alluxio.worker.tieredstore.levels=1 alluxio.worker.tieredstore.level0.alias=SSD alluxio.worker.tieredstore.level0.dirs.path=/data1/data/alluxio_data,/data2/data/alluxio_data,/data3/data/alluxio_data,/data4/data/alluxio_data alluxio.worker.tieredstore.level0.dirs.quota=2.7TB,2.7TB,2.7TB,2.7TB
alluxio.master.mount.table.root.ufs=/data1/data/alluxio_ufs
User properties
alluxio.user.file.readtype.default=NO_CACHE alluxio.user.file.writetype.default=CACHE_THROUGH alluxio.master.ufs.path.cache.capacity=1000000000 alluxio.master.ufs.block.location.cache.capacity=1000000000 alluxio.master.ufs.path.cache.threads=128 alluxio.master.ufs.active.sync.thread.pool.size=128 alluxio.user.file.metadata.load.type=ALWAYS alluxio.master.metastore=ROCKS alluxio.master.journal.type=UFS alluxio.master.journal.checkpoint.period.entries=20000000 alluxio.master.journal.folder=hdfs://ns1/alluxio/journal alluxio.master.journal.log.size.bytes.max=500MB alluxio.user.file.delete.unchecked=true alluxio.underfs.hdfs.configuration=/data1/hadoop-2.2.0/etc/hadoop/core-site.xml:/data1/hadoop-2.2.0/etc/hadoop/hdfs-site.xml alluxio.master.mount.table.root.ufs=hdfs://ns1/alluxio/data alluxio.zookeeper.enabled=true alluxio.zookeeper.address=10.43.81.25:2181,10.43.81.26:2181,10.43.81.27:2181 alluxio.zookeeper.auth.enabled=false alluxio.zookeeper.session.timeout=120s alluxio.security.authentication.type=NOSASL alluxio.security.authorization.permission.enabled=false alluxio.master.audit.logging.enabled=false alluxio.master.ttl.checker.interval=100day alluxio.user.block.write.location.policy.class=alluxio.client.block.policy.RoundRobinPolicy alluxio.worker.network.block.reader.threads.max=4096 alluxio.worker.network.async.cache.manager.threads.max=64 alluxio.user.ufs.block.read.location.policy=alluxio.client.block.policy.DeterministicHashPolicy alluxio.user.ufs.block.read.location.policy.deterministic.hash.shards=3
Expected behavior Normal start up