apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.44k stars 959 forks source link

[Bug] Show create table failed in Hive when there is a field with type varchar(2147483646) in the table #1565

Closed zhoujinsong closed 1 year ago

zhoujinsong commented 1 year ago

Search before asking

Paimon version

0.4.0

Compute Engine

Hive

Minimal reproduce step

What doesn't meet your expectations?

2023-07-13T17:49:05,334 ERROR [a949e4b0-0160-4718-a55a-885fc5692f6b main] metadata.Table: Unable to get field from serde: org.apache.paimon.hive.PaimonSerDe
java.lang.RuntimeException: Varchar length 2147483646 out of allowed range [1, 65535]
    at org.apache.hadoop.hive.serde2.typeinfo.BaseCharUtils.validateVarcharParameter(BaseCharUtils.java:32) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.serde2.typeinfo.VarcharTypeInfo.<init>(VarcharTypeInfo.java:33) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:151) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getVarcharTypeInfo(TypeInfoFactory.java:170) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.paimon.hive.HiveTypeUtils.logicalTypeToTypeInfo(HiveTypeUtils.java:84) ~[paimon-hive-connector-2.1-cdh-6.3-0.5-SNAPSHOT.jar:0.5-SNAPSHOT]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_181]
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) ~[?:1.8.0_181]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_181]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_181]
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_181]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_181]
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_181]
    at org.apache.paimon.hive.HiveSchema.checkSchemaMatched(HiveSchema.java:172) ~[paimon-hive-connector-2.1-cdh-6.3-0.5-SNAPSHOT.jar:0.5-SNAPSHOT]
    at org.apache.paimon.hive.HiveSchema.extract(HiveSchema.java:127) ~[paimon-hive-connector-2.1-cdh-6.3-0.5-SNAPSHOT.jar:0.5-SNAPSHOT]
    at org.apache.paimon.hive.PaimonSerDe.initialize(PaimonSerDe.java:56) ~[paimon-hive-connector-2.1-cdh-6.3-0.5-SNAPSHOT.jar:0.5-SNAPSHOT]
    at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:58) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:531) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:448) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:435) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:280) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:262) ~[hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:632) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:615) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.exec.DDLTask.showCreateTable(DDLTask.java:2281) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.exec.DDLTask.showCreateTable(DDLTask.java:2218) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:498) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1328) [hive-exec-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) [hive-cli-2.1.1-cdh6.3.2.jar:2.1.1-cdh6.3.2]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
    at org.apache.hadoop.util.RunJar.run(RunJar.java:313) [hadoop-common-3.0.0-cdh6.3.2.jar:?]
    at org.apache.hadoop.util.RunJar.main(RunJar.java:227) [hadoop-common-3.0.0-cdh6.3.2.jar:?]

Anything else?

No response

Are you willing to submit a PR?

zhoujinsong commented 1 year ago

After reviewing the error message and related codes, I found that:

Based on this, I would fix this bug by:

@JingsongLi @tsreaper Can you help to review the solution? I will try to fix this bug ASAP if you think it is okay.

Hive type length limit reference:https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-VarcharvarcharVarchar

JingsongLi commented 1 year ago

Very thanks @zhoujinsong for your detailed messages.