Open wuzg2000 opened 3 months ago
不确定是doris还是doris spark connector的问题,先提到doris这里了
Please assign it to me
不确定是doris还是doris spark Connector的问题,先提到doris这里了
请问解决了吗?
补充:问题测试平台是arm64,x86 64平台下正常,今天升级doris到2.1.4正式版后,be.out不再有错误日志,但spark-sql查询报错不变,还是报be连接超时
It may be related to the compatibility of the arm environment. When Spark reads and gets stuck, you can type a pstack on the corresponding be
Search before asking
Version
doris : 2.1.3 & 2.1.4-rc02 spark doris connector:1.3.2 spark:3.4.3
What's Wrong?
Successfully created a temporary view in spark-sql, which refers to a doris table, but read from the view failed, write data is ok. Logs repeated occurs in be.out:
I20240617 15:57:55.426945 2202688 thrift_client.cpp:72] (Attempt 1 of 1) I20240617 15:57:56.783890 2202683 mem_info.cpp:455] Refresh cgroup memory win, refresh again after 10s, cgroup mem limit: 9223372036854710272, cgroup mem usage: 1454833664, cgroup mem info cached: 0 I20240617 15:57:57.527108 2202688 client_cache.h:174] Failed to get client from cache: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)
, retrying[2]... W20240617 15:57:57.527325 2202688 doris_main.cpp:123] thrift internal message: TSocket::open() getaddrinfo() <Host: Port: 0>Name or service not known W20240617 15:57:57.527401 2202688 status.h:412] meet error status: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)
I20240617 15:57:57.527413 2202688 thrift_client.cpp:67] Unable to connect to :0
What You Expected?
Read data from temporary view successfull.
How to Reproduce?
Create a temporary view in spark-sql: spark-sql (default)> create temporary view pdm_org_organization1
Read data from the view, failed with errors: spark-sql (default)> select * from pdm_org_organization1; 08:29:57.287 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.doris.spark.backend.BackendClient - Connect to doris Doris BE{host='node01', port=9060} failed. 08:31:27.441 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0) org.apache.doris.spark.exception.ConnectedFailedException: Connect to Doris BE{host='node01', port=9060}failed. at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:195) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.$anonfun$hasNext$2(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.org$apache$doris$spark$rdd$ScalaValueReader$$lockClient(ScalaValueReader.scala:239) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.hasNext(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.AbstractDorisRDDIterator.hasNext(AbstractDorisRDDIterator.scala:56) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.17.jar:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.scheduler.Task.run(Task.scala:139) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.3.jar:3.4.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_391] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_391] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_391] Caused by: org.apache.doris.shaded.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:179) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] ... 23 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_391] at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_391] at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_391] at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_391] at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_391] at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_391] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_391] at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:177) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] ... 23 more
Anything Else?
The error logs continue to appears even the spark-sql session terminated!
Are you willing to submit PR?
Code of Conduct