Closed d601 closed 4 years ago
Can you share temp.bq_test table definition?
CREATE TABLE temp.bq_test (event_type string, market string, payload string)
STORED BY
'com.google.cloud.hadoop.io.bigquery.hive.HiveBigQueryStorageHandler'
TBLPROPERTIES (
'bq.dataset'='kafka',
'bq.table'='business_events_v7',
'mapred.bq.project.id'='<project>',
'mapred.bq.temp.gcs.path'='gs://<dataproc bucket>/bigquery',
'mapred.bq.gcs.bucket'='<dataproc bucket>'
);
The actual BQ table has more columns than I've declared here.
Edit again: the issue went away, not sure why. I'm trying to reproduce it.
I'm not sure what happened but the issue is gone now and I can't reproduce it at all. I know that last week I had briefly misconfigured the spark thrift server to run on the same port as hive (10000), so it could have been that I was incorrectly connecting to that instead - but attempting to use spark now fails at the 'add jar' step. I'll close this issue, thank you anyway!
Building this repo as-is and attempting to use it in a Dataproc 1.3 cluster results in the following error:
(Redacted the name of the bucket I've put this stuff in)