ExpediaGroup / waggle-dance

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Apache License 2.0
274 stars 76 forks source link

Kerberos Issue #287

Open sivaponting opened 1 year ago

sivaponting commented 1 year ago

Pasted the snippet of config from WD doc and have some doubt. The below config has to be configured both in primary & remote metastores?

In addition, all metastores need to use the Zookeeper shared token:

  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.zookeeper.connectString</name>
    <value>zk1:2181,zk2:2181,zk3:2181</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.zookeeper.znode</name>
    <value>/hive/token</value>
  </property>
sivaponting commented 1 year ago

Getting the below error while connecting to remote metastore using kerberos

image image

Need ur suggestion? Anyone please help to resolve. Thanks

sivaponting commented 1 year ago

Hi @patduin / @abhimanyugupta07 , Does the WD supports DBTokenStore? In my company all the existing Hive metastore is configured with DBTokenStore. For the initial connection it's connecting successfully using Kerberos. For the subsequent connection, it's trying with TOKENs and getting failed with DIGEST-MD5: IO error acquiring password.

patduin commented 1 year ago

I know very little about Kerberos as we don't use it ourselves. Kerberos has been a community contribution perhaps @zzzzming95 can help?

sivaponting commented 1 year ago

I know very little about Kerberos as we don't use it ourselves.

Kerberos has been a community contribution perhaps @zzzzming95 can help?

Thanks for the reply. @zzzzming95 kindly help on this issue.

zzzzming95 commented 1 year ago

@sivaponting

The below config has to be configured both in primary & remote metastores?

The answer is yes. In waggle-dance , we need to use one token to access all metastore (include primary & remote metastores) .

DBTokenStore mean using mysql to store token , it only share the token who using the same mysql. So it need to change the token sotre to ZKtokenstore.

It should be noted that you need to pay attention to the token storage capacity. By default, the storage limit of a single znode in zk is about 50,000 tokens.

sivaponting commented 1 year ago

@sivaponting

The below config has to be configured both in primary & remote metastores?

The answer is yes. In waggle-dance , we need to use one token to access all metastore (include primary & remote metastores) .

DBTokenStore mean using mysql to store token , it only share the token who using the same mysql. So it need to change the token sotre to ZKtokenstore.

It should be noted that you need to pay attention to the token storage capacity. By default, the storage limit of a single znode in zk is about 50,000 tokens.

Thanks for ur comment. PROD metastore already using DBTokenStore. Changing it to ZooKeeperTokenStore may not be the right choice. Shall I use the same MySQL database where the PROD metastore is using? Will tht help to resolve the issue?

zzzzming95 commented 1 year ago

@sivaponting

Shall I use the same MySQL database where the PROD metastore is using?

Using the same mysql mean you just use the same metastore beacause DBTokenStore is link with other interfaces(like get_table , get_partition) .Then there is no need to use waggle-dance.

We also switched from DBTokenStore to ZkTokenStore, which is a feasible solution.

sivaponting commented 1 year ago

@sivaponting

Shall I use the same MySQL database where the PROD metastore is using?

Using the same mysql mean you just use the same metastore beacause DBTokenStore is link with other interfaces(like get_table , get_partition) .Then there is no need to use waggle-dance.

We also switched from DBTokenStore to ZkTokenStore, which is a feasible solution.

Sorry I misunderstood. Yes purpose to access to all remote metastores. So we can't keep the token along with it. As u said ZooKeeperTokenStore is the only solution.

sivaponting commented 1 year ago

Hi @patduin , In my use case, I have some metastores have the same database name. How to handle this in WD?

patduin commented 1 year ago

Please open new discussion for such questions that has nothing todo with original issue. Please see the readme: https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md#database-resolution

flaming-archer commented 1 year ago

Hello, @zzzzming95 . If our primary metastore and federation metastore use different KDCs, how should we configure them in wd?
Do you mean that these configurations are the same in both WD, primary ms, and federation ms.

That is to say, for example, hive. cluster.delegation. token. store. zookeeper. connectString, are they all the same? Are their values are all zk1:2181, zk2:2181, and zk3:2181.

`

hive.cluster.delegation.token.store.class
<value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>

hive.cluster.delegation.token.store.zookeeper.connectString zk1:2181,zk2:2181,zk3:2181 hive.cluster.delegation.token.store.zookeeper.znode /hive/token ` Also, could you provide me with a social way so that I can consult with you, such as WeChat.
flaming-archer commented 7 months ago

https://github.com/ExpediaGroup/waggle-dance/pull/313 @sivaponting Perhaps my PR can meet your needs.