apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.42k stars 3.23k forks source link

[Bug] Spark doris connector cannot read data from doris table via temporary view #36429

Open wuzg2000 opened 3 months ago

wuzg2000 commented 3 months ago

Search before asking

Version

doris : 2.1.3 & 2.1.4-rc02 spark doris connector:1.3.2 spark:3.4.3

What's Wrong?

Successfully created a temporary view in spark-sql, which refers to a doris table, but read from the view failed, write data is ok. Logs repeated occurs in be.out:

I20240617 15:57:55.426945 2202688 thrift_client.cpp:72] (Attempt 1 of 1) I20240617 15:57:56.783890 2202683 mem_info.cpp:455] Refresh cgroup memory win, refresh again after 10s, cgroup mem limit: 9223372036854710272, cgroup mem usage: 1454833664, cgroup mem info cached: 0 I20240617 15:57:57.527108 2202688 client_cache.h:174] Failed to get client from cache: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)

    0#  doris::ThriftClientImpl::open()
    1#  doris::ThriftClientImpl::open_with_retry(int, int)
    2#  doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    3#  doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    4#  doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, doris::Status*, int)
    5#  doris::RuntimeQueryStatiticsMgr::report_runtime_query_statistics()
    6#  doris::Daemon::report_runtime_query_statistics_thread()
    7#  doris::Thread::supervise_thread(void*)
    8#  ?
    9#  ?

, retrying[2]... W20240617 15:57:57.527325 2202688 doris_main.cpp:123] thrift internal message: TSocket::open() getaddrinfo() <Host: Port: 0>Name or service not known W20240617 15:57:57.527401 2202688 status.h:412] meet error status: [THRIFT_RPC_ERROR]Couldn't open transport for :0 (Could not resolve host for client socket.)

    0#  doris::ThriftClientImpl::open()
    1#  doris::ThriftClientImpl::open_with_retry(int, int)
    2#  doris::ClientCacheHelper::_create_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    3#  doris::ClientCacheHelper::get_client(doris::TNetworkAddress const&, std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void**)>&, void**, int)
    4#  doris::ClientConnection<doris::FrontendServiceClient>::ClientConnection(doris::ClientCache<doris::FrontendServiceClient>*, doris::TNetworkAddress const&, int, doris::Status*, int)
    5#  doris::RuntimeQueryStatiticsMgr::report_runtime_query_statistics()
    6#  doris::Daemon::report_runtime_query_statistics_thread()
    7#  doris::Thread::supervise_thread(void*)
    8#  ?
    9#  ?

I20240617 15:57:57.527413 2202688 thrift_client.cpp:67] Unable to connect to :0

What You Expected?

Read data from temporary view successfull.

How to Reproduce?

  1. Create a temporary view in spark-sql: spark-sql (default)> create temporary view pdm_org_organization1

    using doris options ( "table.identifier" = "zfdsp_pdm.pdm_org_organization1", "fenodes" = "node01:18030", "user" = "dsp", "password" = "*****" ); Response code Time taken: 4.441 seconds

  2. Read data from the view, failed with errors: spark-sql (default)> select * from pdm_org_organization1; 08:29:57.287 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.doris.spark.backend.BackendClient - Connect to doris Doris BE{host='node01', port=9060} failed. 08:31:27.441 [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0) org.apache.doris.spark.exception.ConnectedFailedException: Connect to Doris BE{host='node01', port=9060}failed. at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:195) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.$anonfun$hasNext$2(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.org$apache$doris$spark$rdd$ScalaValueReader$$lockClient(ScalaValueReader.scala:239) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.ScalaValueReader.hasNext(ScalaValueReader.scala:207) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.rdd.AbstractDorisRDDIterator.hasNext(AbstractDorisRDDIterator.scala:56) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.17.jar:?] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) ~[spark-sql_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.scheduler.Task.run(Task.scala:139) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) ~[spark-core_2.12-3.4.3.jar:3.4.3] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.3.jar:3.4.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_391] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_391] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_391] Caused by: org.apache.doris.shaded.org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:179) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] ... 23 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_391] at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_391] at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_391] at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_391] at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_391] at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_391] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_391] at org.apache.doris.shaded.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:177) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.transport.TTransport.readAll(TTransport.java:109) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:464) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:362) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:245) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.recvGetNext(TDorisExternalService.java:92) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.sdk.thrift.TDorisExternalService$Client.getNext(TDorisExternalService.java:79) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] at org.apache.doris.spark.backend.BackendClient.getNext(BackendClient.java:172) ~[spark-doris-connector-3.4_2.12-1.3.2.jar:1.3.2] ... 23 more

Anything Else?

The error logs continue to appears even the spark-sql session terminated!

Are you willing to submit PR?

Code of Conduct

wuzg2000 commented 3 months ago

不确定是doris还是doris spark connector的问题,先提到doris这里了

SeaSand1024 commented 3 months ago

Please assign it to me

jnfJNF commented 3 months ago

不确定是doris还是doris spark Connector的问题,先提到doris这里了

请问解决了吗?

wuzg2000 commented 3 months ago

补充:问题测试平台是arm64,x86 64平台下正常,今天升级doris到2.1.4正式版后,be.out不再有错误日志,但spark-sql查询报错不变,还是报be连接超时

JNSimba commented 3 months ago

It may be related to the compatibility of the arm environment. When Spark reads and gets stuck, you can type a pstack on the corresponding be