The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational metadata for their data. AWS Glue provides out-of-box integration with Amazon EMR that enables customers to use the AWS Glue Data Catalog as an external Hive Metastore. This is an open-source implementation of the Apache Hive Metastore client on Amazon EMR clusters that uses the AWS Glue Data Catalog as an external Hive Metastore. It serves as a reference implementation for building a Hive Metastore-compatible client that connects to the AWS Glue Data Catalog. It may be ported to other Hive Metastore-compatible platforms such as other Hadoop and Apache Spark distributions
I connect to EMR cluster using JDBC and I want to list table columns.
org.apache.hive.jdbc.HiveDatabaseMetaData.getColumns(...) will call org.apache.hive.service.cli.operation.GetColumnsOperation that fails, because it expects non-null primary keys:
List<SQLPrimaryKey> primaryKeys = metastoreClient.getPrimaryKeys(new PrimaryKeysRequest(dbName, table.getTableName()));
Set<String> pkColNames = new HashSet<>();
for(SQLPrimaryKey key : primaryKeys) { // primaryKeys is null, so NPE
pkColNames.add(key.getColumn_name().toLowerCase());
}
Stacktrace:
org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns.getResult(TCLIService.java:1557)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 common frames omitted
Caused by: java.lang.NullPointerException: null
at org.apache.hive.service.cli.operation.GetColumnsOperation.runInternal(GetColumnsOperation.java:173)
... 25 common frames omitted
https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/d04285fa7952ddc01df888a8e0b55229105dfef2/aws-glue-datacatalog-hive2-client/src/main/java/com/amazonaws/glue/catalog/metastore/AWSCatalogMetastoreClient.java#L1630
I connect to EMR cluster using JDBC and I want to list table columns.
org.apache.hive.jdbc.HiveDatabaseMetaData.getColumns(...)
will callorg.apache.hive.service.cli.operation.GetColumnsOperation
that fails, because it expects non-null primary keys:Stacktrace: org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns.getResult(TCLIService.java:1557) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetColumns.getResult(TCLIService.java:1542) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 common frames omitted Caused by: java.lang.NullPointerException: null at org.apache.hive.service.cli.operation.GetColumnsOperation.runInternal(GetColumnsOperation.java:173) ... 25 common frames omitted