[SUPPORT] Flink sql create COPY_ON_WRITE partitioned table use hive query raise `UnsupportedOperationException '

MarlboroBoy commented 1 year ago

Describe the problem you faced

When I use the Flink sql client to create a partition table, insert data through the insert statement, and use select * to query data with the hive beeline client, an exception will be thrown and an error will be reported

The error message is

Java. lang. UnsupportedOperationException: org. apache. hadoop. hive. ql. io. queue. convert. ETypeConverter

But there's no problem with using select column name

I want to know if it's due to compilation issues or package conflicts. Can you help me solve this problem

To Reproduce

Steps to reproduce the behavior:

1../bin/sql-client.sh embedded -j /tmp/hudi-flink1.15-bundle-0.13.1.jar shell 2.

REATE TABLE t7(
  uuid VARCHAR(20),
  name VARCHAR(10),
  age INT,
  ts TIMESTAMP(3),
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
  'connector' = 'hudi',
  'path' = '/warehouse/tablespace/external/hive/test.db/t7',
  'table.type' = 'COPY_ON_WRITE',  -- If MERGE_ON_READ, hive query will not have output until the parquet file is generated
  'hive_sync.enable' = 'true',     -- Required. To enable hive synchronization
  'hive_sync.mode' = 'hms',      -- Required. Setting hive sync mode to hms, default jdbc
  'hive_sync.metastore.uris' = 'thrift://node200.xxx.com:9083' -- Required. The port need set on hive-site.xml
);

**INSERT INTO t7 VALUES
  ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
  ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
  ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
  ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
  ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
  ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
  ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
  ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');**

3.

beeline > select * from t7

Error: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://node200.zetyun.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet (state=,code=0)

4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.13.1
Spark version: 2.4.7.7.1.7.0-551
Hive version :3.1.3000.7.1.7.0-551
Hadoop version :3.1.1.7.1.7.0-551
Storage (HDFS/S3/GCS..) :HDFS
Running on Docker? (yes/no) ： no

Additional context

Stacktrace

HiveServer2-Handler-Pool: Thread-105]: Error fetching results: 
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:476) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:946) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:567) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:798) [hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) [hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    ... 13 more
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReaderInternal(HoodieParquetInputFormat.java:97) ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
    at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:91) ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
    at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:810) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:365) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    ... 13 more
Caused by: java.lang.UnsupportedOperationException: org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$10$1
    at org.apache.parquet.io.api.PrimitiveConverter.addLong(PrimitiveConverter.java:105) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.column.impl.ColumnReaderBase$2$4.writeValue(ColumnReaderBase.java:301) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:410) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReaderInternal(HoodieParquetInputFormat.java:97) ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
    at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:91) ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
    at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:810) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:365) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471) ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
    ... 13 more

xicm commented 1 year ago

can you try lastest master?

danny0405 commented 1 year ago

We got a fix in latest master for Hive 3 query on date and timestamp: https://github.com/apache/hudi/commit/1e3cdb66aae9e68c69c8ce6475e87a3daa375781, can you try that patch?

MarlboroBoy commented 1 year ago

We got a fix in latest master for Hive 3 query on date and timestamp: 1e3cdb6, can you try that patch?

Thank you for the suggestion. I will try the patch and see if it resolves the issue. I'll let you know the results

MarlboroBoy commented 1 year ago

@danny0405 @xicm I have already attempted to build based on the master branch, but the issue still persists.

xicm commented 1 year ago

Have you ever restarted hiveserver2? I run this case, it works.

MarlboroBoy commented 1 year ago

Have you ever restarted hiveserver2? I ran this case, it works. Yes, I have already restarted.

Using hive3, I modified the following code during compilation.The code I modified shouldn't have caused this problem, right？

xicm commented 1 year ago

These changes should not have caused the problem.

ad1happy2go commented 1 year ago

@MarlboroBoy Were you Able to get it working with Master code.

apache / hudi

[SUPPORT] Flink sql create COPY_ON_WRITE partitioned table use hive query raise `UnsupportedOperationException ' #8838