Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

leader change than query return exception #17453

Open shiwl opened 1 year ago

shiwl commented 1 year ago

Alluxio Version: 2.8.1

Describe the bug when leader change from alluxio01 master,i will get the exception when i query from alluxio.

To Reproduce 1.alluxio install with HA 2.start alluxio cluster ,and leader is the second master.

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Are you planning to fix it Please indicate if you are already working on a PR.

Additional context b86c62519611b0ac70bd25f1742515f spark configs : spark.driver.memory=5g spark.worker.memory=10g spark.eventLog.compress true spark.driver.extraClassPath /opt/alluxio/alluxio-2.8.1-client.jar spark.executor.extraClassPath /opt/alluxio/alluxio-2.8.1-client.jar spark.sql.catalog.hadoop_catalog = org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.hadoop_catalog.type = hadoop spark.sql.catalog.hadoop_catalog.default-namespace = iceberg_db spark.sql.catalog.hadoop_catalog.warehouse = alluxio://xxxx.78:19998;xxxx.79:19998;xxxx.80:19998/warehouse spark.sql.extensions = org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.iceberg.handle-timestamp-without-timezone = true

spark sql get the error: 23/05/18 15:06:08 ERROR ClientMasterSync: Failed to get master address during initialization alluxio.exception.status.UnavailableException: Failed to handshake with master xxxx78:19998 to load cluster default configuration values: UNAVAILABLE: Network closed for unknown reason at alluxio.conf.Configuration.loadConfiguration(Configuration.java:448) at alluxio.ClientContext.loadConf(ClientContext.java:120) at alluxio.client.metrics.ClientMasterSync.loadConf(ClientMasterSync.java:122) at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: alluxio.shaded.client.io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason at alluxio.shaded.client.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) at alluxio.shaded.client.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) at alluxio.shaded.client.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) at alluxio.grpc.MetaMasterConfigurationServiceGrpc$MetaMasterConfigurationServiceBlockingStub.getConfiguration(MetaMasterConfigurationServiceGrpc.java:424) at alluxio.conf.Configuration.loadConfiguration(Configuration.java:442) ... 10 more

ChunxuTang commented 1 year ago

Thanks for reporting the issue!

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.