apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 916 forks source link

[Bug] Error "Socket is closed by peer" when querying a large dataset with kyuubi.operation.result.format=arrow #6819

Open inkkim opened 3 days ago

inkkim commented 3 days ago

Code of Conduct

Search before asking

Describe the bug

Hello,

I am experiencing an issue when querying a large dataset using the following setup:

Database Client: DataGrip Service Discovery Mode: zookeeper Connected to a Zookeeper ensemble as the host. Driver: kyuubi-hive-jdbc-shaded-1.10.0.jar Configuration: kyuubi.operation.result.format=arrow Table Type: Iceberg table within an Iceberg catalog

The issue occurs when I execute the following query on a table containing approximately 1.9 million rows:

SELECT * FROM catalog.namespace.sample;

Error: Socket is closed by peer.

Symptoms:

The query appears to execute for about 10-15 seconds. DataGrip then displays the error: Socket is closed by peer. However, when I check the Spark Application UI, the query is marked as completed. This discrepancy suggests the issue might be related to the data transfer or communication between the client and server when using the arrow result format. if I query the same dataset with a LIMIT clause (e.g., SELECT * FROM sample LIMIT 1000;), the results are returned successfully.

Please let me know if you need additional information or logs to investigate this issue further.

Affects Version(s)

1.10.0

Kyuubi Server Log Output

2024-11-22 10:19:40.416 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V10
2024-11-22 10:19:40.416 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.session.KyuubiSessionManager: Opening session for user1@10.10.10.10
2024-11-22 10:19:40.417 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.OperationAuditLogger: operation=8cebbc36-03cc-44bd-a1c7-d44352b38152     opType=LaunchEngine     state=INITIALIZED       user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.417 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.log.OperationLog: Creating operation log file /opt/apache-kyuubi-1.10.0-bin/work/server_operation_logs/0aef979d-8fa7-43b5-ba20-21d609e635d1/8cebbc36-03cc-44bd-a1c7-d44352b38152
2024-11-22 10:19:40.417 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.OperationAuditLogger: operation=8cebbc36-03cc-44bd-a1c7-d44352b38152     opType=LaunchEngine     state=PENDING   user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.418 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.session.KyuubiSessionManager: user1's KyuubiSessionImpl with SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1] is opened, current opening sessions 4
2024-11-22 10:19:40.418 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.operation.LaunchEngine: Processing user1's query[8cebbc36-03cc-44bd-a1c7-d44352b38152]: PENDING_STATE -> RUNNING_STATE, statement:
LaunchEngine
2024-11-22 10:19:40.418 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.operation.OperationAuditLogger: operation=8cebbc36-03cc-44bd-a1c7-d44352b38152       opType=LaunchEngine     state=RUNNING   user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.419 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
2024-11-22 10:19:40.419 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=ZOOKEEPER-001:2181,ZOOKEEPER-002:2181,ZOOKEEPER-003:2181 sessionTimeout=60000 watcher=org.apache.kyuubi.shaded.curator.ConnectionState@23d580cc
2024-11-22 10:19:40.420 INFO KyuubiSessionManager-exec-pool: Thread-1239-SendThread(ZOOKEEPER-003:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server ZOOKEEPER-003/10.10.10.21:2181. Will not attempt to authenticate using SASL (unknown error)
2024-11-22 10:19:40.420 INFO KyuubiSessionManager-exec-pool: Thread-1239-SendThread(ZOOKEEPER-003:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to ZOOKEEPER-003/10.10.10.21:2181, initiating session
2024-11-22 10:19:40.422 INFO KyuubiSessionManager-exec-pool: Thread-1239-SendThread(ZOOKEEPER-003:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server ZOOKEEPER-003/10.10.10.21:2181, sessionid = 0x300231d79b100f1, negotiated timeout = 40000
2024-11-22 10:19:40.422 INFO KyuubiSessionManager-exec-pool: Thread-1239-EventThread org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2024-11-22 10:19:40.424 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service instance:SPARKYARN-M-001:34147 engine id:application_1730883837080_0041 and version:1.10.0 under /kyuubi_1.10.0_USER_SPARK_SQL/user1/default
2024-11-22 10:19:40.437 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.session.KyuubiSessionImpl: [user1:127.0.0.1] SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1] - Connected to engine [SPARKYARN-M-001:34147]/[application_1730883837080_0041] with SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1]]
2024-11-22 10:19:40.437 INFO Curator-Framework-0 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting
2024-11-22 10:19:40.439 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Session: 0x300231d79b100f1 closed
2024-11-22 10:19:40.439 INFO KyuubiSessionManager-exec-pool: Thread-1239-EventThread org.apache.kyuubi.shaded.zookeeper.ClientCnxn: EventThread shut down for session: 0x300231d79b100f1
2024-11-22 10:19:40.439 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.operation.LaunchEngine: Processing user1's query[8cebbc36-03cc-44bd-a1c7-d44352b38152]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.021 seconds
2024-11-22 10:19:40.439 INFO KyuubiSessionManager-exec-pool: Thread-1239 org.apache.kyuubi.operation.OperationAuditLogger: operation=8cebbc36-03cc-44bd-a1c7-d44352b38152       opType=LaunchEngine     state=FINISHED  user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.452 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.session.KyuubiSessionImpl: [user1:127.0.0.1] SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1] - Starting to wait the launch engine operation finished
2024-11-22 10:19:40.452 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.session.KyuubiSessionImpl: [user1:127.0.0.1] SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1] - Engine has been launched, elapsed time: 0 s
2024-11-22 10:19:40.465 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Client information for SessionHandle [0aef979d-8fa7-43b5-ba20-21d609e635d1]: ApplicationName = DataGrip 2023.1.1
2024-11-22 10:19:40.497 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.OperationAuditLogger: operation=87c27fc0-8482-4fda-a8e9-09ee38accff3     opType=ExecuteStatement state=INITIALIZED       user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.497 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.log.OperationLog: Creating operation log file /opt/apache-kyuubi-1.10.0-bin/work/server_operation_logs/0aef979d-8fa7-43b5-ba20-21d609e635d1/87c27fc0-8482-4fda-a8e9-09ee38accff3
2024-11-22 10:19:40.497 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.OperationAuditLogger: operation=87c27fc0-8482-4fda-a8e9-09ee38accff3     opType=ExecuteStatement state=PENDING   user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:40.499 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.ExecuteStatement: Processing user1's query[87c27fc0-8482-4fda-a8e9-09ee38accff3]: PENDING_STATE -> RUNNING_STATE, statement:
SELECT * FROM bmhud.bmhud.F_AP_OTPT_INFM
2024-11-22 10:19:40.499 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.OperationAuditLogger: operation=87c27fc0-8482-4fda-a8e9-09ee38accff3       opType=ExecuteStatement state=RUNNING   user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:45.500 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.ExecuteStatement: Query[87c27fc0-8482-4fda-a8e9-09ee38accff3] in RUNNING_STATE
2024-11-22 10:19:50.502 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.ExecuteStatement: Query[87c27fc0-8482-4fda-a8e9-09ee38accff3] in RUNNING_STATE
2024-11-22 10:19:51.105 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.ExecuteStatement: Query[87c27fc0-8482-4fda-a8e9-09ee38accff3] in FINISHED_STATE
2024-11-22 10:19:51.106 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.ExecuteStatement: Processing user1's query[87c27fc0-8482-4fda-a8e9-09ee38accff3]: RUNNING_STATE -> FINISHED_STATE, time taken: 10.606 seconds
2024-11-22 10:19:51.106 INFO KyuubiSessionManager-exec-pool: Thread-1245 org.apache.kyuubi.operation.OperationAuditLogger: operation=87c27fc0-8482-4fda-a8e9-09ee38accff3       opType=ExecuteStatement state=FINISHED  user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:52.311 ERROR KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Error fetching results:
org.apache.kyuubi.shaded.thrift.transport.TTransportException: Socket is closed by peer.
        at org.apache.kyuubi.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:184)
        at org.apache.kyuubi.shaded.thrift.transport.TTransport.readAll(TTransport.java:109)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:338)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:421)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.read(TSaslTransport.java:397)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
        at org.apache.kyuubi.shaded.thrift.transport.TTransport.readAll(TTransport.java:109)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245)
        at org.apache.kyuubi.shaded.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:548)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Client.FetchResults(TCLIService.java:535)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$fetchResults$1(KyuubiSyncThriftClient.scala:438)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$withLockAcquiredAsyncRequest$2(KyuubiSyncThriftClient.scala:168)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
2024-11-22 10:19:52.431 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.ExecuteStatement: Processing user1's query[87c27fc0-8482-4fda-a8e9-09ee38accff3]: FINISHED_STATE -> CLOSED_STATE, time taken: 11.932 seconds
2024-11-22 10:19:52.431 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.OperationAuditLogger: operation=87c27fc0-8482-4fda-a8e9-09ee38accff3     opType=ExecuteStatement state=CLOSED    user=user1   session=0aef979d-8fa7-43b5-ba20-21d609e635d1
2024-11-22 10:19:52.432 WARN KyuubiTBinaryFrontendHandler-Pool: Thread-197 org.apache.kyuubi.operation.ExecuteStatement: Error closing THandleIdentifier(guid:87 C2 7F C0 84 82 4F DA A8 E9 09 EE 38 AC CF F3, secret:AA BB 5B 97 3E A0 41 EE AC 16 9B D7 08 ED 8F 38): Socket is closed by peer.
org.apache.kyuubi.shaded.thrift.transport.TTransportException: Socket is closed by peer.
        at org.apache.kyuubi.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:184)
        at org.apache.kyuubi.shaded.thrift.transport.TTransport.readAll(TTransport.java:109)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:338)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:421)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslTransport.read(TSaslTransport.java:397)
        at org.apache.kyuubi.shaded.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:39)
        at org.apache.kyuubi.shaded.thrift.transport.TTransport.readAll(TTransport.java:109)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362)
        at org.apache.kyuubi.shaded.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245)
        at org.apache.kyuubi.shaded.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Client.recv_CloseOperation(TCLIService.java:502)
        at org.apache.kyuubi.shaded.hive.service.rpc.thrift.TCLIService$Client.CloseOperation(TCLIService.java:489)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$closeOperation$1(KyuubiSyncThriftClient.scala:414)
        at org.apache.kyuubi.client.KyuubiSyncThriftClient.$anonfun$withLockAcquiredAsyncRequest$2(KyuubiSyncThriftClient.scala:168)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

kyuubi.authentication                    NONE
kyuubi.frontend.bind.host                10.10.10.15
kyuubi.frontend.protocols                THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port  10009
kyuubi.frontend.rest.bind.port           10099
kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                USER
kyuubi.session.engine.idle.timeout       PT0M
kyuubi.engine.ui.stop.enabled             true
kyuubi.session.engine.spark.showProgress true
kyuubi.ha.addresses                      ZOOKEEPER-001:2181,ZOOKEEPER-002:2181,ZOOKEEPER-003:2181
kyuubi.ha.namespace                      kyuubi

Kyuubi Engine Configurations

spark.master                     yarn
spark.executor.cores             3
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              3g
spark.executor.memory            10g
spark.yarn.am.memory             1g
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs:///user/user1/history/done
spark.history.fs.logDirectory    hdfs:///user/user1/history/done
spark.worker.cleanup.enabled     true
spark.worker.cleanup.interval    86400
spark.worker.cleanup.appDataTtl  604800
spark.yarn.queue                 dev
spark.yarn.jars                  file:/usr/local/spark/jars/*
spark.history.fs.logDirectory    hdfs:///user/user1/history/done
spark.eventLog.compress          true
spark.driver.extraClassPath=$SPARK_HOME/jars/spark-network-yarn_2.12-3.5.1.jar
spark.submit.proxyUser.allowCustomClasspathInClusterMode true
spark.shuffle.compress true
spark.shuffle.spill.compress true
spark.sql.shuffle.partitions 240
spark.rdd.compress     true
spark.dynamicAllocation.enabled true
spark.plugins io.dataflint.spark.SparkDataflintPlugin
spark.dataflint.iceberg.autoCatalogDiscovery true

Additional context

No response

Are you willing to submit PR?

pan3793 commented 3 days ago

it indicates that Spark driver OOM, for general large result set solution, please refer to https://kyuubi.readthedocs.io/en/master/deployment/spark/large_query_results.html