apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.43k stars 954 forks source link

[Bug] Use hive cannot insert data from not partition table to a partition table. #4550

Open FrommyMind opened 2 days ago

FrommyMind commented 2 days ago

Search before asking

Paimon version

0.9

Compute Engine

Hive: CDH6.3.2

Minimal reproduce step

SET hive.metastore.warehouse.dir=/user/hive/warehouse;

create table hive_test_np(
a int,
b string,
c string)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';

insert into hive_test_np values(1, 'aaa', 'nice');

create table hive_test(
a int,
b string)
PARTITIONED BY ( c string) 
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';

insert into hive_test partition(c) select * from hive_test_np;

What doesn't meet your expectations?

2024-11-19 21:45:16,486 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"a":1,"b":"aaa","c":"nice"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"a":1,"b":"aaa","c":"nice"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:494)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
    ... 8 more
Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 2
    at org.apache.paimon.hive.mapred.PaimonRecordWriter.write(PaimonRecordWriter.java:69)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
    ... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
    at org.apache.paimon.data.GenericRow.isNullAt(GenericRow.java:131)
    at Projection$0.apply(Unknown Source)
    at org.apache.paimon.table.sink.RowPartitionKeyExtractor.partition(RowPartitionKeyExtractor.java:44)
    at org.apache.paimon.table.sink.RowKeyExtractor.partition(RowKeyExtractor.java:57)
    at org.apache.paimon.table.sink.TableWriteImpl.toSinkRecord(TableWriteImpl.java:204)
    at org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:174)
    at org.apache.paimon.table.sink.TableWriteImpl.write(TableWriteImpl.java:147)
    at org.apache.paimon.hive.mapred.PaimonRecordWriter.write(PaimonRecordWriter.java:67)
    ... 16 more

Anything else?

No response

Are you willing to submit a PR?

zhuangchong commented 17 hours ago

Currently does not support hive Write hive partition table, suggest you use flink or spark.