TOSIT-IO / tdp-collection

Ansible collection to deploy the components of TDP
Apache License 2.0
21 stars 19 forks source link

hive ACID support #797

Open GuillaumeHold opened 11 months ago

GuillaumeHold commented 11 months ago

The hive compaction should be activated on one of the hive metastores. This requires distinction between the two hive metastores , to configure the hive.compactor.initiator.on on only one of the instance.

Pierrotws commented 11 months ago

Properties linked to ACID support:

hive.support.concurrency – true hive.enforce.bucketing – true (Not required as of Hive 2.0) hive.exec.dynamic.partition.mode – nonstrict hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on – true (for exactly one instance of the Thrift metastore service) hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service

GuillaumeHold commented 11 months ago

When trying to create a transactional table with "transactional"="true", the query fail with :

Error: Error while compiling statement: FAILED: LockException [Error 10280]: Error communicating with the metastore (state=42000,code=10280)

In the metastore logs we get :

ERROR [pool-11-thread-10:ProcessFunction@41] - Internal error processing open_txns
org.apache.hadoop.hive.metastore.api.MetaException: Unable to select from transaction database org.postgresql.util.PSQLException: ERROR: relation "next_txn_id" does not exist
[...]
INFO  [pool-11-thread-10:TxnHandler@3275] - Non-retryable error in openTxns(OpenTxnRequest(num_txns:1, user:tdp_user, hostname:master-03, agentInfo:Unknown)) : ERROR: relation "next_txn_id" does not exist
  Position: 23 (SQLState=42P01, ErrorCode=0)
[...]
ERROR [pool-11-thread-10:RetryingHMSHandler@201] - MetaException(message:Unable to select from transaction database org.postgresql.util.PSQLException: ERROR: relation "next_txn_id" does not exist

This seems to be linked to https://issues.apache.org/jira/browse/HIVE-22546, where there is problem with the metastore schema for the postgresql backend. The transaction manager look for next_txn_id while the metastore have NEXT_TXN_ID .