Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

Failed to Access an Alluxio Cluster with HA #10359

Closed zc87328000 closed 4 years ago

zc87328000 commented 5 years ago

Alluxio Version: Alluxio2.0.0 zookeeper-3.5.5

Describe the bug job_master.log Opening socket connection to server 10.43.81.25/10.43.81.25:2181. Will not attempt to authenticate using SASL (unknown error)

Session establishment complete on server 10.43.81.25/10.43.81.25:2181, sessionid = 0x10014a98bdd000c, negotiated timeout = 40000

worker.log 2019-10-30 09:50:34,564 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 1): alluxio.exception.status.UnavailableException: Failed to handshake with master 10-43-81-25/10.43.43.25:19998 to load cluster default configuration values: UNAVAILABLE: Network closed for unknown reason

master.log 2019-10-30 11:21:56,910 ERROR MetaMasterSync - Failed to receive leader master heartbeat command. alluxio.exception.status.UnavailableException: Failed to connect to MetaMasterMaster @ 10-43-81-25/10.43.43.25:19998 after 7 attempts at alluxio.AbstractClient.connect(AbstractClient.java:264) at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:367) at alluxio.AbstractClient.retryRPC(AbstractClient.java:331) at alluxio.master.meta.RetryHandlingMetaMasterMasterClient.getId(RetryHandlingMetaMasterMasterClient.java:76) at alluxio.master.meta.MetaMasterSync.setIdAndRegister(MetaMasterSync.java:115) at alluxio.master.meta.MetaMasterSync.heartbeat(MetaMasterSync.java:71) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: alluxio.exception.status.UnavailableException: Failed to handshake with master 10-43-81-25/10.43.43.25:19998 to load cluster default configuration values: UNAVAILABLE: Network closed for unknown reason at alluxio.util.ConfigurationUtils.loadConfiguration(ConfigurationUtils.java:490) at alluxio.ClientContext.loadConf(ClientContext.java:129) at alluxio.ClientContext.loadConfIfNotLoaded(ClientContext.java:150) at alluxio.AbstractClient.beforeConnect(AbstractClient.java:166) at alluxio.AbstractClient.connect(AbstractClient.java:224) ... 11 more Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233) at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214) at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139) at alluxio.grpc.MetaMasterConfigurationServiceGrpc$MetaMasterConfigurationServiceBlockingStub.getConfiguration(MetaMasterConfigurationServiceGrpc.java:375) at alluxio.util.ConfigurationUtils.loadConfiguration(ConfigurationUtils.java:484) ... 15 more

To Reproduce alluxio-site.properties

alluxio.master.hostname=10.43.43.25

Worker properties

alluxio.worker.memory.size=1024GB alluxio.worker.tieredstore.levels=1 alluxio.worker.tieredstore.level0.alias=SSD alluxio.worker.tieredstore.level0.dirs.path=/data1/data/alluxio_data,/data2/data/alluxio_data,/data3/data/alluxio_data,/data4/data/alluxio_data alluxio.worker.tieredstore.level0.dirs.quota=2.7TB,2.7TB,2.7TB,2.7TB

alluxio.master.mount.table.root.ufs=/data1/data/alluxio_ufs

User properties

alluxio.user.file.readtype.default=NO_CACHE alluxio.user.file.writetype.default=CACHE_THROUGH alluxio.master.ufs.path.cache.capacity=1000000000 alluxio.master.ufs.block.location.cache.capacity=1000000000 alluxio.master.ufs.path.cache.threads=128 alluxio.master.ufs.active.sync.thread.pool.size=128 alluxio.user.file.metadata.load.type=ALWAYS alluxio.master.metastore=ROCKS alluxio.master.journal.type=UFS alluxio.master.journal.checkpoint.period.entries=20000000 alluxio.master.journal.folder=hdfs://ns1/alluxio/journal alluxio.master.journal.log.size.bytes.max=500MB alluxio.user.file.delete.unchecked=true alluxio.underfs.hdfs.configuration=/data1/hadoop-2.2.0/etc/hadoop/core-site.xml:/data1/hadoop-2.2.0/etc/hadoop/hdfs-site.xml alluxio.master.mount.table.root.ufs=hdfs://ns1/alluxio/data alluxio.zookeeper.enabled=true alluxio.zookeeper.address=10.43.81.25:2181,10.43.81.26:2181,10.43.81.27:2181 alluxio.zookeeper.auth.enabled=false alluxio.zookeeper.session.timeout=120s alluxio.security.authentication.type=NOSASL alluxio.security.authorization.permission.enabled=false alluxio.master.audit.logging.enabled=false alluxio.master.ttl.checker.interval=100day alluxio.user.block.write.location.policy.class=alluxio.client.block.policy.RoundRobinPolicy alluxio.worker.network.block.reader.threads.max=4096 alluxio.worker.network.async.cache.manager.threads.max=64 alluxio.user.ufs.block.read.location.policy=alluxio.client.block.policy.DeterministicHashPolicy alluxio.user.ufs.block.read.location.policy.deterministic.hash.shards=3

Expected behavior Normal start up

ns1123 commented 5 years ago

Hi @zc87328000, can you provide more master.log sections? I think the root cause may be seen higher up in the stack trace.

This message seems a bit puzzling alluxio.exception.status.UnavailableException: Failed to connect to MetaMasterMaster @ 10-43-81-25/10.43.43.25:19998 after 7 attempts

Is 10-43-81-25 a hostname that resolves to IP 10.43.43.25? Can you also confirm whether the other masters in the quorum are up and running or are they also getting similar errors?

zc87328000 commented 5 years ago

Hi @ns1123 The IP(10.43.43.25)of the hostname is 10-43-81-25. master.log

2019-10-31 14:36:42,692 INFO Compatibility - Running in ZooKeeper 3.4.x compatibility mode 2019-10-31 14:36:42,707 INFO CuratorFrameworkImpl - Starting 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:host.name=10-43-81-25 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.version=1.8.0_211 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.vendor=Oracle Corporation 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.home=/data1/jdk/jre 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.class.path=/app/alluxio/conf/::/app/alluxio/assembly/alluxio-server-2.0.0.jar 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.io.tmpdir=/tmp 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:java.compiler= 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:os.name=Linux 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:os.arch=amd64 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:os.version=3.10.0-693.el7.x86_64 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:user.name=root 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:user.home=/root 2019-10-31 14:36:42,713 INFO ZooKeeper - Client environment:user.dir=/root 2019-10-31 14:36:42,713 INFO ZooKeeper - Initiating client connection, connectString=10.43.43.25:2181,10.43.43.26:2181,10.43.43.27:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@18eed359 2019-10-31 14:36:42,726 INFO CuratorFrameworkImpl - Default schema 2019-10-31 14:36:42,726 INFO CuratorFrameworkImpl - backgroundOperationsLoop exiting 2019-10-31 14:36:42,726 INFO ClientCnxn - Opening socket connection to server 10.43.43.26/10.43.43.26:2181. Will not attempt to authenticate using SASL (unknown error) 2019-10-31 14:36:42,733 INFO ClientCnxn - Socket connection established to 10.43.43.26/10.43.43.26:2181, initiating session 2019-10-31 14:36:42,740 INFO ClientCnxn - Session establishment complete on server 10.43.43.26/10.43.43.26:2181, sessionid = 0x2001da613180004, negotiated timeout = 40000 2019-10-31 14:36:42,745 INFO ZooKeeper - Session: 0x2001da613180004 closed 2019-10-31 14:36:42,745 INFO ClientCnxn - EventThread shut down 2019-10-31 14:36:42,745 INFO CuratorFrameworkImpl - Starting 2019-10-31 14:36:42,746 INFO ZooKeeper - Initiating client connection, connectString=10.43.43.25:2181,10.43.43.26:2181,10.43.43.27:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@2a17b7b6 2019-10-31 14:36:42,746 INFO CuratorFrameworkImpl - Default schema 2019-10-31 14:36:42,746 INFO ClientCnxn - Opening socket connection to server 10.43.43.26/10.43.43.26:2181. Will not attempt to authenticate using SASL (unknown error) 2019-10-31 14:36:42,747 INFO ClientCnxn - Socket connection established to 10.43.43.26/10.43.43.26:2181, initiating session 2019-10-31 14:36:42,750 INFO ClientCnxn - Session establishment complete on server 10.43.43.26/10.43.43.26:2181, sessionid = 0x2001da613180005, negotiated timeout = 40000 2019-10-31 14:36:42,758 INFO ConnectionStateManager - State change: CONNECTED 2019-10-31 14:36:42,890 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 2019-10-31 14:36:42,890 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster 2019-10-31 14:36:42,890 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 2019-10-31 14:36:42,890 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 2019-10-31 14:36:42,914 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:42,939 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,024 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,067 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,078 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,101 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:43,114 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,154 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,187 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,188 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,379 INFO RocksStore - Opened rocks database under path /app/alluxio/metastore/blocks 2019-10-31 14:36:43,401 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:43,415 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,440 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:43,450 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,524 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,555 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,558 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,559 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,564 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:43,573 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,591 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,591 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,601 INFO ExtensionFactoryRegistry - Loading core jars from /app/alluxio/lib 2019-10-31 14:36:43,610 INFO ExtensionFactoryRegistry - Loading extension jars from /app/alluxio/extensions 2019-10-31 14:36:43,610 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,642 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,643 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,647 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 14:36:43,677 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 14:36:43,678 WARN HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 14:36:43,851 INFO RocksStore - Opened rocks database under path /app/alluxio/metastore/inodes 2019-10-31 14:36:43,899 INFO ZkMasterInquireClient - Creating new zookeeper client for zk@10.43.43.25:2181,10.43.43.26:2181,10.43.43.27:2181/job_leader 2019-10-31 14:36:44,018 INFO RocksStore - Opened rocks database under path /app/alluxio/metastore/inodes 2019-10-31 14:36:44,069 INFO ProcessUtils - Starting Alluxio master @/10.43.43.25:19998. 2019-10-31 14:36:44,182 INFO RocksStore - Opened rocks database under path /app/alluxio/metastore/blocks 2019-10-31 14:36:44,184 INFO UfsJournalCheckpointThread - BlockMaster: Journal checkpoint thread started. 2019-10-31 14:36:44,285 INFO RocksStore - Opened rocks database under path /app/alluxio/metastore/inodes 2019-10-31 14:36:44,285 INFO UfsJournalCheckpointThread - FileSystemMaster: Journal checkpoint thread started. 2019-10-31 14:36:44,285 INFO UfsJournalCheckpointThread - MetaMaster: Journal checkpoint thread started. 2019-10-31 14:36:44,286 INFO UfsJournalCheckpointThread - MetricsMaster: Journal checkpoint thread started. 2019-10-31 14:36:44,291 INFO ZkMasterInquireClient - Creating new zookeeper client for zk@10.43.43.25:2181,10.43.43.26:2181,10.43.43.27:2181/alluxio/leader 2019-10-31 14:36:44,293 INFO DefaultMetaMaster - Standby master with address 10.43.43.25:19998 starts sending heartbeat to leader master. 2019-10-31 14:36:44,293 INFO AlluxioMasterProcess - All masters started 2019-10-31 14:36:44,293 INFO FaultTolerantAlluxioMasterProcess - Secondary started 2019-10-31 14:36:44,301 INFO CuratorFrameworkImpl - Starting 2019-10-31 14:36:44,301 INFO ZooKeeper - Initiating client connection, connectString=10.43.43.25:2181,10.43.43.26:2181,10.43.43.27:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@494faa03 2019-10-31 14:36:44,302 INFO ClientCnxn - Opening socket connection to server 10.43.43.25/10.43.43.25:2181. Will not attempt to authenticate using SASL (unknown error) 2019-10-31 14:36:44,302 INFO CuratorFrameworkImpl - Default schema 2019-10-31 14:36:44,303 INFO ClientCnxn - Socket connection established to 10.43.43.25/10.43.43.25:2181, initiating session 2019-10-31 14:36:44,306 INFO ClientCnxn - Session establishment complete on server 10.43.43.25/10.43.43.25:2181, sessionid = 0x1001da60937000a, negotiated timeout = 40000 2019-10-31 14:36:44,306 INFO ConnectionStateManager - State change: CONNECTED 2019-10-31 14:36:44,312 INFO AbstractPrimarySelector - Primary selector transitioning to PRIMARY 2019-10-31 14:36:44,316 INFO PrimarySelectorClient - Creating zk path: /alluxio/leader/10-43-81-25:19998 2019-10-31 14:36:44,317 INFO PrimarySelectorClient - 10-43-81-25:19998 is now the leader. 2019-10-31 14:36:44,327 ERROR MetaMasterSync - Failed to receive leader master heartbeat command. alluxio.exception.status.UnavailableException: Failed to determine address for MetaMasterMaster after 1 attempts at alluxio.AbstractClient.connect(AbstractClient.java:253) at alluxio.AbstractClient.retryRPCInternal(AbstractClient.java:367) at alluxio.AbstractClient.retryRPC(AbstractClient.java:331) at alluxio.master.meta.RetryHandlingMetaMasterMasterClient.getId(RetryHandlingMetaMasterMasterClient.java:76) at alluxio.master.meta.MetaMasterSync.setIdAndRegister(MetaMasterSync.java:115) at alluxio.master.meta.MetaMasterSync.heartbeat(MetaMasterSync.java:71) at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

zc87328000 commented 5 years ago

Hi @ns1123 Now I can start all masters and workers,but I can't start job_master If I execute this (alluxio-start.sh job_master) command, I will make such a mistake.

2019-10-31 20:05:20,514 INFO utils.Compatibility (Compatibility.java:) - Running in ZooKeeper 3.4.x compatibility mode 2019-10-31 20:05:20,530 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting 2019-10-31 20:05:20,537 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2019-10-31 20:05:20,537 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:host.name=10-43-81-25 2019-10-31 20:05:20,537 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.version=1.8.0_211 2019-10-31 20:05:20,537 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.vendor=Oracle Corporation 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.home=/app/jdk1.8.0_211/jre 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.class.path=/app/alluxio/conf/::/app/alluxio/assembly/alluxio-server-2.0.0.jar 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.library.path=/usr/local/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.io.tmpdir=/tmp 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:java.compiler= 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.name=Linux 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.arch=amd64 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:os.version=3.10.0-693.el7.x86_64 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.name=root 2019-10-31 20:05:20,538 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.home=/root 2019-10-31 20:05:20,539 INFO zookeeper.ZooKeeper (Environment.java:logEnv) - Client environment:user.dir=/app/alluxio/logs 2019-10-31 20:05:20,540 INFO zookeeper.ZooKeeper (ZooKeeper.java:) - Initiating client connection, connectString=10-43-81-25:2181,10-43-81-26:2181,10-43-81-27:2181 sessionTimeout=120000 watcher=org.apache.curator.ConnectionState@57536d79 2019-10-31 20:05:20,552 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Default schema 2019-10-31 20:05:20,552 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:backgroundOperationsLoop) - backgroundOperationsLoop exiting 2019-10-31 20:05:20,553 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 10-43-81-26/10.43.81.26:2181. Will not attempt to authenticate using SASL (unknown error) 2019-10-31 20:05:20,560 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 10-43-81-26/10.43.81.26:2181, initiating session 2019-10-31 20:05:20,567 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 10-43-81-26/10.43.81.26:2181, sessionid = 0x26e20b4e697001b, negotiated timeout = 40000 2019-10-31 20:05:20,570 INFO zookeeper.ZooKeeper (ZooKeeper.java:close) - Session: 0x26e20b4e697001b closed 2019-10-31 20:05:20,570 INFO zookeeper.ClientCnxn (ClientCnxn.java:run) - EventThread shut down 2019-10-31 20:05:20,570 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Starting 2019-10-31 20:05:20,571 INFO zookeeper.ZooKeeper (ZooKeeper.java:) - Initiating client connection, connectString=10-43-81-25:2181,10-43-81-26:2181,10-43-81-27:2181 sessionTimeout=120000 watcher=org.apache.curator.ConnectionState@711f39f9 2019-10-31 20:05:20,572 INFO imps.CuratorFrameworkImpl (CuratorFrameworkImpl.java:start) - Default schema 2019-10-31 20:05:20,572 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect) - Opening socket connection to server 10-43-81-27/10.43.81.27:2181. Will not attempt to authenticate using SASL (unknown error) 2019-10-31 20:05:20,573 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection) - Socket connection established to 10-43-81-27/10.43.81.27:2181, initiating session 2019-10-31 20:05:20,582 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected) - Session establishment complete on server 10-43-81-27/10.43.81.27:2181, sessionid = 0x36e20b524ec0030, negotiated timeout = 40000 2019-10-31 20:05:20,591 INFO state.ConnectionStateManager (ConnectionStateManager.java:postState) - State change: CONNECTED 2019-10-31 20:05:20,733 INFO master.ZkMasterInquireClient (ZkMasterInquireClient.java:) - Creating new zookeeper client for zk@10-43-81-25:2181,10-43-81-26:2181,10-43-81-27:2181/alluxio/leader 2019-10-31 20:05:20,768 INFO network.NettyUtils (NettyUtils.java:checkNettyEpollAvailable) - EPOLL_MODE is available 2019-10-31 20:05:20,820 INFO network.TieredIdentityFactory (TieredIdentityFactory.java:localIdentity) - Initialized tiered identity TieredIdentity(node=10-43-81-25, rack=null) 2019-10-31 20:05:20,835 INFO extensions.ExtensionFactoryRegistry (ExtensionFactoryRegistry.java:scanLibs) - Loading core jars from /app/alluxio/lib 2019-10-31 20:05:20,858 INFO extensions.ExtensionFactoryRegistry (ExtensionFactoryRegistry.java:scanExtensions) - Loading extension jars from /app/alluxio/extensions 2019-10-31 20:05:20,939 WARN hdfs.HdfsUnderFileSystem (HdfsUnderFileSystem.java:) - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported. 2019-10-31 20:05:20,980 WARN util.NativeCodeLoader (NativeCodeLoader.java:) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-31 20:05:20,991 WARN hdfs.HdfsUnderFileSystem (HdfsUnderFileSystem.java:) - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported. 2019-10-31 20:05:20,997 INFO alluxio.ProcessUtils (ProcessUtils.java:run) - Starting Alluxio job master @ 10-43-81-25/10.43.81.25:20001. 2019-10-31 20:05:20,999 INFO ufs.UfsJournalCheckpointThread (UfsJournalCheckpointThread.java:runInternal) - JobMaster: Journal checkpoint thread started.

ns1123 commented 4 years ago

@zc87328000 sorry for the delayed response, have you managed to figure out the issue? I noticed your latest logs have Alluxio using ZK 3.4.x compatibility mode even though you mentioned your ZK quorum is at version 3.5.x.

rastogiasr commented 4 years ago

@zc87328000 - is this still an issue, if so, please re-open the issue.

np-ftrwei commented 2 years ago

It is an unhelpful and problematic chain of operations to keep closing multiple threads of complains in a short time with NO resolution and nobody getting alarmed at Alluxio that dozens of issues are filed around the same reason.

Its closing in on 3 years and v2.8.1 right OUT OF THE BOX still shows exact same issue. I am a strong customer and mostly for the poor documentation and a complete lack of help and support, I will move on to some other data orchestration solution. I am a CTO and although I do commend you guys on the fact that you have a live slack channel and all, but Alluxio is still immature when it comes to (1) software quality, (2) documentation, (3) tech support. Slacking is not how even open-source software should engage for troubleshooting, let alone a paid product like yours.

No response is going to be read/ entertained. Only support is.

@rastogiasr @alluxio-bot ...

andyzheung commented 1 year ago

a complete lack of help and support, I will move on to some other data orchestration solution. I am a CTO and although I do commend you guys on the fact that you have a live slack channel and all, but Alluxio is still immature when it comes to (1) software quality, (2) documentation, (3) tech support. Slacking is not how even open-source software should engage for troubleshooting, let alone a paid product like y

yes, I meet the same problem on this, but don't get any support.... Do you have find the other data orchestration solution? @np-ftrwei