embulk / embulk-output-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC output plugins for Embulk
Other
88 stars 86 forks source link

[Hive] Can't insert to databases other than "default" #304

Closed hiracky16 closed 2 years ago

hiracky16 commented 2 years ago

Hi. I am trying to add records to hive using this plugin. But, I can't add records when database is not default. When database is default, I could add records to a table in default.

Setting of this plugin when is selected default.

out:
    type: jdbc
    ...
    url: 'jdbc:hive2://{{ HIVE_HOST }}:10000/default'
    schema: default
    ...

Setting of this plugin when is selected tmp. ( tmp is a table already exists )

out:
    type: jdbc
    ...
    url: 'jdbc:hive2://{{ HIVE_HOST }}:10000/tmp'
    schema: tmp
    ...

Error detail

2021-10-05 03:20:28.482 +0000 [INFO] (0001:transaction): Connecting to jdbc:hive2://xxxxxxxxxx:10000/tmp options {user=hadoop}
log4j:WARN No appenders could be found for logger (com.amazon.hive.jdbc41.internal.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2021-10-05 03:20:29.092 +0000 [INFO] (0001:transaction): SQL: SET search_path TO `tmp`
2021-10-05 03:20:29.190 +0000 [ERROR] (0001:transaction): Operation failed (11320:HY000)
org.embulk.exec.PartialExecutionException: java.lang.RuntimeException: java.sql.SQLDataException: [Amazon][JDBC](11320) A RowCount was expected but not generated from query "SET search_path TO `tmp`". Query not executed.
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:340)
        at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:566)
        at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
        at org.embulk.spi.Exec.doWith(Exec.java:22)
        at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
        at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:242)
        at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:291)
        at org.embulk.EmbulkRunner.run(EmbulkRunner.java:155)
        at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:431)
        at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:90)
        at org.embulk.cli.Main.main(Main.java:64)
        Suppressed: java.lang.NullPointerException
                at org.embulk.exec.BulkLoader.doCleanup(BulkLoader.java:463)
                at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:397)
                at org.embulk.exec.BulkLoader$3.run(BulkLoader.java:394)
                at org.embulk.spi.Exec.doWith(Exec.java:22)
                at org.embulk.exec.BulkLoader.cleanup(BulkLoader.java:394)
                at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:245)
                ... 5 more

Please tell me why the error is occurring.

hiroyuki-sato commented 2 years ago

Hello, @hiracky16

I think the best way is to develop embulk-output-hive.

Have you ever tried database: tmp instead of scheme?

I'm not familiar with Apache Hive. It seems that create database and create schema are identical(interchangeable). https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase

The current implementation execute SET search_path TO <schema_name> for getting schema. I think Hive does not support that query. (Probably many databases don't support this query. except for PostgreSQL)

hiracky16 commented 2 years ago

@hiroyuki-sato By simply replacing schema with database, I can add records. I should have checked the documentation of hive.

Thanks for the advice.