apache / paimon-trino

Trino Connector for Apache Paimon.
https://paimon.apache.org/
Apache License 2.0
22 stars 25 forks source link

How to configure paimon's catalog file in trino, I want to use hive's metastore as metadata storage #26

Open groobyming opened 1 year ago

groobyming commented 1 year ago

Hi, i want to use hive's metastore as metadata storage, but i can't find any documentation on how to do this

s7monk commented 1 year ago

Hi, i want to use hive's metastore as metadata storage, but i can't find any documentation on how to do this

t can be configured according to official documents, and the same is true for Hive Catalog.

s7monk commented 1 year ago

Hi, i want to use hive's metastore as metadata storage, but i can't find any documentation on how to do this

https://paimon.apache.org/docs/master/engines/trino/

linhao1990 commented 5 months ago

I found add the following config to your trino paimon catalog may work:

metastore=hive
uri=thrift://metatore-server:port
hive-conf-dir=/etc/trino/paimon_hadoop_configs

You also need to add the following jars to your paimon plugin lib, otherwise you will meet NoSuchMethodException error:

And since trino use jdk17, the following code piece in org.apache.hadoop.hive.metastore.HiveMetaStoreClient#resolveUris cannot work

if (MetastoreConf.getVar(conf, ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) {
  List uriList = Arrays.asList(metastoreUris);
  Collections.shuffle(uriList);
  metastoreUris = (URI[]) uriList.toArray();
}

you can add the following config to your hive-site.xml as a workaround:

<property>
  <name>metastore.thrift.uri.selection</name>
  <value>SEQUENTIAL</value>
</property>
nyu531 commented 2 months ago

@linhao1990 I'm using high availability Hadoop, and this is how I set it up...

connector.name=paimon
metastore=hive
uri=thrift://[metatore-server1]:port,thrift://[metatore-server2]:port
hadoop-conf-dir=/home1/irteam/apps/trino/etc/hadoop
hive-conf-dir=/home1/irteam/apps/trino/etc/hadoop
hive.config.resources=/home1/irteam/apps/trino/etc/hadoop/hdfs-site.xml,/home1/irteam/apps/trino/etc/hadoop/core-site.xml,/home1/irteam/apps/trino/etc/hadoop/hive-site.xml

And when I run the query, I get these errors.

Path missing in file system location: `hdfs://[my-ha-hadoop-nameservice]`

Do you use highly available Hadoop? If you do, I want to hear what you know... Thank you


self answer: need to reassign warehouse because my warehouse(hive.metastore.warehouse.dir) in hive-site.xml does not start with hdfs://nameservice because of fs.defaultFS in core-site.xml

warehouse=hdfs://[my-ha-hadoop-nameservice]/[warehouse directory]

ex)

warehouse=hdfs://abc/warehouse/tablespace/managed/hive

It is related to HiveCatalog in paimon-bundle, I'm going to make bug report at paimon repository.

Also it needs hive-apache-3.1.2-22.jar (https://repo1.maven.org/maven2/io/trino/hive/hive-apache/3.1.2-22/hive-apache-3.1.2-22.jar)