StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.65k stars 1.75k forks source link

Paimon doesn't work with Azure (abfss) - dependencies? #49289

Open creativedutchmen opened 1 month ago

creativedutchmen commented 1 month ago

I'm using the CelerData BYOC on Azure, and I'm trying to add a Paimon catalog to query my data in Azure data lake (gen2). It seems like some libraries to communicate over abfss are missing, even though I can read directly using the protocol.

Steps to reproduce the behavior (Required)

CREATE EXTERNAL CATALOG paimon
PROPERTIES
(
    "type" = "paimon",
    "paimon.catalog.type" = "filesystem",
    "paimon.catalog.warehouse" = "abfss://xxx@xxx.dfs.core.windows.net/xxxx",
    "azure.adls1.oauth2_client_id" = "xxxxx",
    "azure.adls1.oauth2_client_secret" = "xxxx",
    "azure.adls1.oauth2_client_endpoint" = "https://login.microsoftonline.com/xxxx/oauth2/token"
);

Expected behavior (Required)

A catalog is created and it's possible to use it and read data from the tables.

Real behavior (Required)

An exception is raised:

org.apache.paimon.fs.UnsupportedSchemeException: Could not find a file io implementation for scheme 'abfss' in the classpath.

StarRocks version (Required)

3.2.8-ee-366ca94

kevincai commented 1 month ago

@Smith-Cruise @before-Sunrise can you take a look?

Smith-Cruise commented 1 month ago

Sorry, we have some connection issues with Azure in Paimon, and we are fixing it now.

Smith-Cruise commented 1 month ago

You can configure core-site.xml and put azure's hadoop jar in be/lib/paimon-reader-lib, it can bypass this error.

Smith-Cruise commented 3 weeks ago

you have to use adls gen2's grammar to create catalog