apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 916 forks source link

[Bug] Connect from PowerBI #3032

Closed hanna-liashchuk closed 2 years ago

hanna-liashchuk commented 2 years ago

Code of Conduct

Search before asking

Describe the bug

Since Kyuubi is an alternative to STS, I've tried to connect to it from PowerBI (PB) for some reports but met some errors along the way. PB refuses to connect to version 1.5.2 with the error "The host and port specified for the connection do not seem to belong to a Spark server", connection to 1.4.1 works OK, but functionality is very limited, e.g. creating a "Top N" filter fails with an error. After some investigation on the second one, I found out PB thinks that 1.4.1 is a Spark version and cuts functionality as it's too old.

Simba::SparkODBC::SOConnection::Connect: Server version (1.4.1-incubating) is lower than 2.1.0. Setting FETCH_FIRST fetch orientation as                 not supported because server lower than version 2.1.0 treats FETCH_FIRST as                 FETCH_NEXT.
Simba::SparkODBC::HardySettings::SetServerNotSupportsFetchFirst: +++++ enter +++++

ODBC logs can bee found here SparkODBC-1.4.1.log SparkODBC-1.5.2.log

I've built Kyuubi with spark version (3.2.1) instead of 1.4.1 and PB was working as expected. Under the hood, PB is using Microsoft Spark ODBC Driver. Is there a way to configure Kyuubi metadata to work with Power BI? Otherwise it's a great tool that can't be used in companies that have found themselves in a vendor lock-in and since Power BI is a Microsoft tool, I believe there are enough of them

Affects Version(s)

1.4.1-incubating, 1.5.2-incubating

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

pan3793 commented 2 years ago

I'm sorry I'm not familiar w/ PB and ODBC, and I think the Microsoft Spark ODBC Driver is not an open-source project.

I see Simba::SparkODBC from your logs, and we got reported that "Simba" ODBC drvier use a tricky way to retrieve server‘s version, please try to set spark.sql.hive.version=2.3.9 and spark.sql.hive.metastore.version=2.3.9 (actually I forget which one take effect) in your $SPAKR_HOME/spark-defaults.conf (2.3.9 is chosen because you are using Spark 3.2.1)

pan3793 commented 2 years ago

And for Simba::SparkODBC::SOConnection::Connect: Server version (1.4.1-incubating) ...

The DBMS_NAME and DBMS_VER are defined at https://github.com/apache/incubator-kyuubi/blob/95cb57e80bf9ca8dd0510df3d45c52d9f22ef448/kyuubi-common/src/main/scala/org/apache/kyuubi/session/AbstractSession.scala#L110-L111 You can change it to "2.3.9" to see what will happen, if it does work, I think we can make these two values configurable.

hanna-liashchuk commented 2 years ago

hi @pan3793, thanks for your help setting spark.sql.* params didn't help, but changing the source code - did! So far, everything seems to be working fine. I checked with 1.4.1-incubating version, let me check with 1.5.2

hanna-liashchuk commented 2 years ago

So, great news, 1.5.2 with the code edit is working as well :)

pan3793 commented 2 years ago

Glad to see it works! Would you like to send a PR to "make these two values configurable" and(or) add a document in "quick start" section to share how to use Power BI to connect Kyuubi?

hanna-liashchuk commented 2 years ago

Yes, would like to. Except, this would be my first commit to kyuubi project, so guide me here :) By configurable you mean that a user can override that value in kyuubi-defaults.conf via some parameter?

pan3793 commented 2 years ago

this would be my first commit to kyuubi project, so guide me here :)

Yes, of course.

By configurable you mean that a user can override that value in kyuubi-defaults.conf via some parameter?

Yes. To achieve this, you should add the ConfigEntrys(they should be named as kyuubi.server.cli.dbms.name and kyuubi.server.cli.dbms.ver or something) in KyuubiConf, and refer to them in AbstractSession#getInfo, then we can configure them in kyuubi-defaults.conf

For documents, you can add a new page in the folder docs/quick_start and follow the https://kyuubi.readthedocs.io/en/latest/develop_tools/build_document.html to build and preview the document site in local

pan3793 commented 2 years ago

This issue has been addressed in https://github.com/apache/incubator-kyuubi/pull/3122, w/ configuration kyuubi.server.info.provider=ENGINE, the GetInfo will return the engine(e.g. Spark)'s information instead of Kyuubi.

The feature will be available in 1.6.1-incubating.

pan3793 commented 2 years ago

cc @WANGHui2022, since you reported this issue in another channel.

marszd commented 1 year ago

@hanna-liashchuk I used PowerBi to connect kyuub_1.6.0, but there was a problem. Because there was an extra layer of "spark_catalog" in kyuubi's schema, PB would treat the database as a table. I saw that you also used PB to connect kyuubi, could you please give me some advice.

  1. ODBC configuration image

  2. Problem phenomenon image

hanna-liashchuk commented 1 year ago

@marszd we are using Spark connector with protocol: Standart. Have you tried using it?