apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 915 forks source link

[Bug] In Dbeaver KyuubiTBinaryFrontendService: Error getting tables #6777

Closed SGITLOGIN closed 3 weeks ago

SGITLOGIN commented 4 weeks ago

Code of Conduct

Search before asking

Describe the bug

problem

The kyuubi server is configured with both Kerberos and LDAP authentication. HDFS is highly available. When I use dbever to access kyuubi and query hive, I always access the namenode corresponding to the previous nn1. When nn1 is the master, there is no problem with dbever access. When nn1 is in standby mode, an error will be reported. How to deal with this problem?

namenode high availability configuration

image

When nn1 is standby, the following error is reported

image

When nn1 is the master, access is normal

image

Affects Version(s)

1.9.2

Kyuubi Server Log Output

2024-10-23 16:18:06.021 ERROR KyuubiTBinaryFrontendHandler-Pool: Thread-72 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Error getting tables: 
org.apache.kyuubi.KyuubiSQLException: Error operating GetTables: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1585)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3374)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1216)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1044)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:223)
        at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:69)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:319)
        at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:278)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.$anonfun$listAllNamespaces$1(SparkCatalogUtils.scala:113)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.$anonfun$listAllNamespaces$1$adapted(SparkCatalogUtils.scala:112)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
        at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.listAllNamespaces(SparkCatalogUtils.scala:112)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.listAllNamespaces(SparkCatalogUtils.scala:129)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.listNamespacesWithPattern(SparkCatalogUtils.scala:137)
        at org.apache.kyuubi.engine.spark.util.SparkCatalogUtils$.getCatalogTablesOrViews(SparkCatalogUtils.scala:163)
        at org.apache.kyuubi.engine.spark.operation.GetTables.runInternal(GetTables.scala:81)
        at org.apache.kyuubi.operation.AbstractOperation.run(AbstractOperation.scala:173)
        at org.apache.kyuubi.session.AbstractSession.runOperation(AbstractSession.scala:101)
        at org.apache.kyuubi.engine.spark.session.SparkSessionImpl.runOperation(SparkSessionImpl.scala:101)
        at org.apache.kyuubi.session.AbstractSession.getTables(AbstractSession.scala:162)
        at org.apache.kyuubi.service.AbstractBackendService.getTables(AbstractBackendService.scala:94)
        at org.apache.kyuubi.service.TFrontendService.GetTables(TFrontendService.scala:329)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Processor$GetTables.getResult(TCLIService.java:1770)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Processor$GetTables.getResult(TCLIService.java:1750)
        at org.apache.kyuubi.shaded.thrift.ProcessFunction.process(ProcessFunction.java:38)
        at org.apache.kyuubi.shaded.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
        at org.apache.kyuubi.service.authentication.TSetIpAddressProcessor.process(TSetIpAddressProcessor.scala:35)
        at org.apache.kyuubi.shaded.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1585)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3374)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1216)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1044)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
)
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1666)
        at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1651)
        at org.apache.spark.sql.hive.client.Shim_v0_12.databaseExists(HiveShim.scala:609)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:406)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
        at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:406)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
        ... 33 more
Caused by: MetaException(message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2107)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1585)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3374)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1216)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1044)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
        at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result$get_database_resultStandardScheme.read(ThriftHiveMetastore.java:40276)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result$get_database_resultStandardScheme.read(ThriftHiveMetastore.java:40244)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result.read(ThriftHiveMetastore.java:40175)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:1135)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:1122)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1511)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1506)
        at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
        at com.sun.proxy.$Proxy70.getDatabase(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2773)
        at com.sun.proxy.$Proxy70.getDatabase(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1662)
        ... 45 more

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

## kyuubi-env.sh  
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.402.b06-1.el7_9.x86_64
export SPARK_HOME=/usr/odp/current/spark3-client/
export SPARK_CONF_DIR=/etc/spark3/conf/
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export YARN_CONF_DIR=/etc/hadoop/conf/

## kyuubi-defaults.conf
kyuubi.authentication                    KERBEROS,LDAP
kyuubi.kinit.principal hive/_HOST@HUAN.TV
kyuubi.kinit.keytab /etc/security/keytabs/hive.service.keytab
kyuubi.authentication.ldap.baseDN=dc=hadoop,dc=com
kyuubi.authentication.ldap.binddn=cn=Manager,dc=hadoop,dc=com
kyuubi.authentication.ldap.bindpw=UKWCVRfeAe72hTgr
kyuubi.authentication.ldap.url=ldap://open-ldap-test:389/
kyuubi.authentication.ldap.groupClassKey groupOfNames
kyuubi.authentication.ldap.groupDNPattern CN=%s,OU=Group,DC=hadoop,DC=com
kyuubi.authentication.ldap.groupMembershipKey memberUid
kyuubi.authentication.ldap.userDNPattern UID=%s,OU=People,DC=hadoop,DC=com
kyuubi.frontend.bind.host                ali-odp-test-01.huan.tv
kyuubi.frontend.protocols                THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port  10009
kyuubi.frontend.rest.bind.port           10099
kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                USER
kyuubi.engine.doAs.enabled true
kyuubi.metadata.store.jdbc.database.schema.init true
kyuubi.metadata.store.jdbc.database.type MYSQL
kyuubi.metadata.store.jdbc.driver com.mysql.jdbc.Driver
kyuubi.metadata.store.jdbc.url jdbc:mysql://rm-uf63s1w0quw2ayvn7.mysql.rds.aliyuncs.com:3306/kyuubi
kyuubi.metadata.store.jdbc.user kyuubi
kyuubi.metadata.store.jdbc.password Fza4zDXgbGE
kyuubi.session.engine.initialize.timeout PT30M
kyuubi.session.check.interval PT1M
kyuubi.operation.idle.timeout PT1M
kyuubi.session.idle.timeout PT10M
kyuubi.session.engine.idle.timeout PT5M
kyuubi.ha.client.class org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient
kyuubi.ha.addresses                      ali-odp-test-01:2181,ali-odp-test-02:2181,ali-odp-test-03:2181
kyuubi.ha.namespace                      kyuubi
kyuubi.ha.zookeeper.auth.type KERBEROS
kyuubi.ha.zookeeper.auth.principal zookeeper@HUAN.TV
kyuubi.ha.zookeeper.auth.keytab /root/zookeeper.keytab
spark.master yarn
spark.yarn.queue default
spark.executor.cores 1
spark.driver.memory 3g
spark.executor.memory 3g
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.shuffleTracking.enabled true
spark.dynamicAllocation.minExecutors 1
spark.dynamicAllocation.maxExecutors 10
spark.dynamicAllocation.initialExecutors 2
spark.cleaner.periodicGC.interval 5min

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

pan3793 commented 4 weeks ago

the invocation chain is

DBeaver => JDBC driver => Kyuubi Server => Spark Driver => HMS => HDFS

the error you reported indicates that HMS => HDFS may have issues when NN fails over.

SGITLOGIN commented 4 weeks ago

@pan3793 Hello, there is no such problem when using beeline to access hive. Hive will first access the first namenode internally. If the first namenode is standby, it will also report the error Operation category READ is not supported in state standby, and then access the second namenode. However, the current phenomenon when using kyuubi to query is that when accessing a cluster with Kerberos authentication through dbever, the first namenode will be accessed first. If it is found to be in standby state, this error will be thrown directly without looking for the second namenode. Is this design method caused by dbever or kyuubi?

HDFS Configurations

dfs.client.failover.proxy.provider.ha-nn org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

I tried using RequestHedgingProxyProvider for the dfs.client.failover.proxy.provider parameter to no avail.

pan3793 commented 4 weeks ago

Debug your HMS process to figure out what happened.

SGITLOGIN commented 4 weeks ago

@pan3793 Hello, the hive server only reports this and no other logs. Is it possible that dbever is not adapted to this mechanism? dfs.client.failover.proxy.provider.ha-nn=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

image image
SGITLOGIN commented 3 weeks ago

After turning on HDFS high availability, you need to manually change the location addresses of the sys and information_schema libraries in hive: hdfs://ali-odp-test-01.huan.tv:8020/warehouse/tablespace/managed/hive/sys.db hdfs://ali-odp-test-01.huan.tv:8020/warehouse/tablespace/managed/hive/information_schema.db change: hdfs://ha-nn/warehouse/tablespace/managed/hive/sys.db hdfs://ha-nn/warehouse/tablespace/managed/hive/information_schema.db

pan3793 commented 3 weeks ago

So this is caused by the obsolete path in HMS for existing databases/tables after enabling NameNode HA, SPARK-22121 was raised (by Cloudera) to tackle such issues by converting namenode to nameservice, but unfortunately, it was rejected by the Spark community. The patch might be included in Cloudera's Spark.

SGITLOGIN commented 3 weeks ago

Ok, thanks