GoogleCloudDataproc / hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive
Apache License 2.0
19 stars 11 forks source link

NoSuchFieldError: MANDATORY_CONFIG_PROPERTIES_INPUT #16

Closed d601 closed 4 years ago

d601 commented 4 years ago

Building this repo as-is and attempting to use it in a Dataproc 1.3 cluster results in the following error:

hive> add jar gs://<snip>/artifacts/hive-bigquery-storage-handler-1.0-shaded.jar;
Added [/tmp/dba1500d-5dbd-4d08-b2f3-6a22068d88c2_resources/hive-bigquery-storage-handler-1.0-shaded.jar] to class path
Added resources: [gs://<snip>/artifacts/hive-bigquery-storage-handler-1.0-shaded.jar]
hive> list jars;
/tmp/dba1500d-5dbd-4d08-b2f3-6a22068d88c2_resources/hive-bigquery-storage-handler-1.0-shaded.jar
hive> select * from temp.bq_test limit 10;
20/07/24 17:35:53 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
20/07/24 17:35:53 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
OK
20/07/24 17:35:54 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
Exception in thread "main" java.lang.NoSuchFieldError: MANDATORY_CONFIG_PROPERTIES_INPUT
        at com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.getTable(DirectBigQueryInputFormat.java:91)
        at com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat.getSplits(DirectBigQueryInputFormat.java:76)
        at com.google.cloud.hadoop.io.bigquery.hive.WrappedBigQueryAvroInputFormat.getSplits(WrappedBigQueryAvroInputFormat.java:81)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

(Redacted the name of the bucket I've put this stuff in)

prathapreddy123 commented 4 years ago

Can you share temp.bq_test table definition?

d601 commented 4 years ago
CREATE TABLE temp.bq_test (event_type string, market string, payload string)  
 STORED BY 
 'com.google.cloud.hadoop.io.bigquery.hive.HiveBigQueryStorageHandler' 
 TBLPROPERTIES ( 
 'bq.dataset'='kafka', 
 'bq.table'='business_events_v7', 
 'mapred.bq.project.id'='<project>',
 'mapred.bq.temp.gcs.path'='gs://<dataproc bucket>/bigquery',
 'mapred.bq.gcs.bucket'='<dataproc bucket>' 
 );

The actual BQ table has more columns than I've declared here.

Edit again: the issue went away, not sure why. I'm trying to reproduce it.

d601 commented 4 years ago

I'm not sure what happened but the issue is gone now and I can't reproduce it at all. I know that last week I had briefly misconfigured the spark thrift server to run on the same port as hive (10000), so it could have been that I was incorrectly connecting to that instead - but attempting to use spark now fails at the 'add jar' step. I'll close this issue, thank you anyway!