GoogleCloudDataproc / hive-bigquery-connector

A library enabling BigQuery as Hive storage handler
Apache License 2.0
9 stars 15 forks source link

Addressing Column Mapping Issue with Case Sensitivity Restrictions #120

Open sharmavarun1108 opened 4 months ago

sharmavarun1108 commented 4 months ago

BQ table:

CREATE TABLE `my_proj.my_dataset.my_table`(
id STRING,
source_id STRING,
TRUE_INDY_NAME STRING,
dt DATE)
PARTITION BY dt
CLUSTER BY source_id;

Hive Table: (GCP dataproc 2.1)

add jar gs://hadoop-lib/hive-bigquery/hive-bigquery-connector-2.0.3.jar;
CREATE EXTERNAL TABLE dev.test_bq_ext (
id STRING,
source_id STRING,
true_indy_name STRING,
dt DATE)
STORED BY 'com.google.cloud.hive.bigquery.connector.BigQueryStorageHandler'
TBLPROPERTIES (
'bq.clustered.fields'='source_id',
'bq.table'='my_proj.my_dataset.my_table',
'bq.time.partition.field'='dt',
'bq.time.partition.type'='DAY');

Error:

0: jdbc:dataproc://hive/> select * from dev.test_bq_ext limit 1;
Error: java.io.IOException: java.lang.RuntimeException: Unable to find column TRUE_INDY_NAME in columns [id, source_id, true_indy_name, dt] (state=,code=0)

I also tried: WITH SERDEPROPERTIES ('casesensitive'='TRUE_INDY_NAME')

Unable to find any work around so far.

jphalip commented 4 months ago

Hi @sharmavarun1108. I believe this would have already been fixed in #98, however it's hasn't been included in the latest release yet. Could you please try using the main branch and creating your own JAR, then test again? You can find some instructions here: https://github.com/GoogleCloudDataproc/hive-bigquery-connector?tab=readme-ov-file#option-2-manual-installation

We hope to create a new release that will include a lot of improvements in the next few weeks. Thanks for your patience!

sharmavarun1108 commented 4 months ago

Hi @jphalip , I ran a quick test with jar built using main branch. Looks like the issue is resolved.

jphalip commented 4 months ago

Great, thank you for checking!