Open vishal2535 opened 2 years ago
Hi
After using the correct jar file added to Hive as below
ADD JAR <LOCATION>/hive-bigquery-storage-handler-1.0-shaded.jar;
and also adding the jar file gcs-connector-hadoop3-2.2.5-shaded.jar (as suggested by Igor), to HADOOP_CLASSPATH, I can insert a row through Hive table into GBQ table as below
DROP TABLE IF EXISTS test.testmebq;
CREATE EXTERNAL TABLE test.testmebq(
ID BIGINT
)
STORED BY
'com.google.cloud.hadoop.io.bigquery.hive.HiveBigQueryStorageHandler'
TBLPROPERTIES (
'mapred.bq.input.dataset.id'='test',
'mapred.bq.input.table.id'='testme',
'mapred.bq.project.id'='xxx',
'mapred.bq.temp.gcs.path'='gs://tmp_storage_bucket/tmp',
'mapred.bq.gcs.bucket'='etcbucket/hive/test'
)
;
and creating a similar table in GBQ dataset test.testme with id column in lowercase. the insert from Hive goes through with no error!
**insert into test.testmebq values(11);***
2022-03-07 18:10:17,159 INFO [main] conf.HiveConf: Using the default value
passed in for log id: 3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b
2022-03-07 18:10:17,406 WARN [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in
the future versions. Consider using a different execution engine (i.e.
spark, tez) or using Hive 1.X releases.
Query ID = hduser_20220307181017_90752d3e-bb82-4f5a-94db-49afdf96cabf
2022-03-07 18:10:17,406 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Query ID =
hduser_20220307181017_90752d3e-bb82-4f5a-94db-49afdf96cabf
Total jobs = 1
2022-03-07 18:10:17,406 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Total jobs = 1
2022-03-07 18:10:17,406 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Starting task [Stage-0:DDL] in serial mode
2022-03-07 18:10:17,420 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
avro.AvroSerDe: AvroSerde::initialize(): Preset value of
avro.schema.literal == null
2022-03-07 18:10:17,420 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
avro.AvroSerDe: AvroSerde::initialize(): Preset value of
avro.schema.literal == null
2022-03-07 18:10:17,420 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
avro.AvroSerDe: AvroSerde::initialize(): Preset value of
avro.schema.literal == null
2022-03-07 18:10:17,466 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Starting task [Stage-1:DDL] in serial mode
2022-03-07 18:10:17,822 INFO [LocalJobRunner Map Task Executor #0]
exec.MapOperator: Initializing operator MAP[0]
2022-03-07 18:10:17,822 INFO [LocalJobRunner Map Task Executor #0]
mr.ExecMapper:
<MAP>Id =0
<Children>null
<\Children>
<Parent><\Parent>
<\MAP>
2022-03-07 18:10:18,742 WARN [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2022-03-07 18:10:18,742 Stage-2 map = 0%, reduce = 0%
2022-03-07 18:10:18,742 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
exec.Task: 2022-03-07 18:10:18,742 Stage-2 map = 0%, reduce = 0%
2022-03-07 18:10:19,187 INFO [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: RECORDS_OUT_1_test.testmebq:1,
RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_FS_5:1,
2022-03-07 18:10:19,187 INFO [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner:
2022-03-07 18:10:19,187 INFO [LocalJobRunner Map Task Executor #0]
mapred.Task: Task:attempt_local378396178_0007_m_000000_0 is done. And is in
the process of committing
2022-03-07 18:10:19,190 INFO [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner: map
2022-03-07 18:10:19,190 INFO [LocalJobRunner Map Task Executor #0]
mapred.Task: Task 'attempt_local378396178_0007_m_000000_0' done.
2022-03-07 18:10:19,190 INFO [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner: Finishing task:
attempt_local378396178_0007_m_000000_0
2022-03-07 18:10:19,190 INFO [Thread-1079] mapred.LocalJobRunner: map task
executor complete.
2022-03-07 18:10:19,745 Stage-2 map = 100%, reduce = 0%
2022-03-07 18:10:19,745 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b
main] exec.Task:
2022-03-07 18:10:19,745 Stage-2 map = 100%, reduce = 0%
Ended Job = job_local378396178_0007
2022-03-07 18:10:19,746 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
exec.Task: Ended Job = job_local378396178_0007
MapReduce Jobs Launched:
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: MapReduce Jobs Launched:
Stage-Stage-2: HDFS Read: 19312718 HDFS Write: 0 SUCCESS
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Stage-Stage-2: HDFS Read: 19312718 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Total MapReduce CPU Time Spent: 0 msec
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: Completed executing
command(queryId=hduser_20220307181017_90752d3e-bb82-4f5a-94db-49afdf96cabf);
Time taken: 2.341 seconds
*OK*
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
ql.Driver: OK
2022-03-07 18:10:19,747 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
lockmgr.DbTxnManager: Stopped heartbeat for query:
hduser_20220307181017_90752d3e-bb82-4f5a-94db-49afdf96cabf
Time taken: 2.607 seconds
2022-03-07 18:10:19,774 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
CliDriver: Time taken: 2.607 seconds
2022-03-07 18:10:19,775 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
conf.HiveConf: Using the default value passed in for log id:
3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b
2022-03-07 18:10:19,775 INFO [3bf1fd70-21ae-4648-8f4f-bcd12eb0d63b main]
session.SessionState: Resetting thread name to main
However, the insert does not show in the Hive or GBQ table!
0: jdbc:hive2://rhes75:10099/default> select * from testmebq
. . . . . . . . . . . . . . . . . . > +--------------+
| testmebq.id |
+--------------+
+--------------+
No rows selected (2.512 seconds)
and neither in GBQ table
SELECT * FROM test.testme
Query complete (0.3 sec elapsed, 0 B processed)
Job information
Results
JSON
Execution details
This query returned no results.
Yes in short there is no error thrown but no rows added!
As I stated this seems to be an issue with mr as engine. There is now the issue of OutputCommitter not being called somehow, something probably to do with jar, as all goes through ok with no result. Any fixes that can be applied?
Facing an issue with record insertion. Insert statements are giving success message but no records are inserted to BQ table. At the time of insertion, it first stores records to temp files in GCS buckets which we set at table creation time. I am seeing generated temp files in buckets but it's not pushing those temp files to further BQ table. I am using Hive 2.3.9 with Hadoop compatible version of 2.10.1 and testing it with MR as a execution engine. MR somehow does not seem to call OutputCommitter.
_Customized configuration files for my project are here along with the hive generated logs for INSERT operation. configuration_files.zip hive.log_
Note:- Joins and Select operations are working. Below are the query output logs for Insert operation.