StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.77k stars 1.76k forks source link

query error from HUDI external table with partitions #4568

Closed tiannan-sr closed 2 years ago

tiannan-sr commented 2 years ago

Steps to reproduce the behavior (Required)

  1. create hudi table with partition, and insert data:

    create table hudi_par_parquet_gzip(
    uuid int,
    col_boolean boolean,
    col_int int,
    col_long long,
    col_float float,
    col_double double,
    col_decimal decimal(38,18),
    col_date date,
    col_timestamp timestamp,
    col_string string,
    col_binary binary,
    col_array array<int>,
    col_struct struct<a:string,b:int>,
    col_map map<string,int>
    ) USING hudi 
    TBLPROPERTIES(
    type = 'cow',
    primarykey = 'uuid')
    partitioned by(col_date,col_int);
    
    insert into hudi_par_parquet_gzip partition(col_date = '2020-01-01', col_int='1') select 1,true,1.01,1.001,1.0001,10000000000.0000000001,cast('2020-01-01 00:00:01' as timestamp),'Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?',cast('1110001010101011001001' as binary),array(1,10,100),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-01-01', col_int='2') select 1,true,1.01,1.001,1.0001,10000000000.0000000001,cast('2020-01-01 00:00:01' as timestamp),'Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?',cast('1110001010101011001001' as binary),array(1,10,100),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-02-01', col_int='2') select 2,false,2.02,2.002,2.0002,20000000000.0000000002,cast('2020-02-01 00:00:02' as timestamp),'Xi Jinping, general secretary of the Communist Party of China Central Committee, called on Tuesday for unrelenting efforts in exercising full and rigorous governance over the Party, saying that the CPC will continue to show zero tolerance for corruption.',cast('1110001010101011001001' as binary),array(2,20,200),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-02-01', col_int='3') select 2,false,2.02,2.002,2.0002,20000000000.0000000002,cast('2020-02-01 00:00:02' as timestamp),'Xi Jinping, general secretary of the Communist Party of China Central Committee, called on Tuesday for unrelenting efforts in exercising full and rigorous governance over the Party, saying that the CPC will continue to show zero tolerance for corruption.',cast('1110001010101011001001' as binary),array(2,20,200),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-03-01', col_int='3') select 3,true,3.03,3.003,3.0003,30000000000.0000000003,cast('2020-03-01 00:00:04' as timestamp),'Nation raises caution on overseas packages',cast('1110001010101011001001' as binary),array(3,30,300),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-03-01', col_int='4') select 3,true,3.03,3.003,3.0003,30000000000.0000000003,cast('2020-03-01 00:00:04' as timestamp),'Nation raises caution on overseas packages',cast('1110001010101011001001' as binary),array(3,30,300),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-04-01', col_int='4') select 4,false,4.04,4.004,4.0004,40000000000.0000000004,cast('2020-04-01 00:00:04' as timestamp),'Shanghai, Shenzhen register most newly listed firms in 2021',cast('1110001010101011001001' as binary),array(4,40,400),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-04-01', col_int='5') select 4,false,4.04,4.004,4.0004,40000000000.0000000004,cast('2020-04-01 00:00:04' as timestamp),'Shanghai, Shenzhen register most newly listed firms in 2021',cast('1110001010101011001001' as binary),array(4,40,400),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-05-01', col_int='5') select 5,true,5.05,5.005,5.0005,50000000000.0000000005,cast('2020-05-01 00:00:05' as timestamp),'A total of 4,685 companies have been listed on the A-share market in China as of Dec 31, 2021, with 46 percent of them based in Beijing, Shanghai, Shenzhen, Hangzhou, Suzhou, Guangzhou, Ningbo, Nanjing, Wuxi, and Chengdu, said a report of National Business Daily on Wednesday.',cast('1110001010101011001001' as binary),array(5,50,500),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-05-01', col_int='6') select 5,true,5.05,5.005,5.0005,50000000000.0000000005,cast('2020-05-01 00:00:05' as timestamp),'A total of 4,685 companies have been listed on the A-share market in China as of Dec 31, 2021, with 46 percent of them based in Beijing, Shanghai, Shenzhen, Hangzhou, Suzhou, Guangzhou, Ningbo, Nanjing, Wuxi, and Chengdu, said a report of National Business Daily on Wednesday.',cast('1110001010101011001001' as binary),array(5,50,500),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-06-01', col_int='6') select 6,false,6.06,6.006,6.0006,60000000000.0000000006,cast('2020-06-01 00:00:06' as timestamp),'The development of enterprises, as well as the push of local governments, has boosted the listings, the report said. For example, in early 2018, Central China''s Hubei province launched a plan to double its listed companies by including the listing of enterprises into the government''s annual performance assessment.',cast('1110001010101011001001' as binary),array(6,60,600),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-06-01', col_int='7') select 6,false,6.06,6.006,6.0006,60000000000.0000000006,cast('2020-06-01 00:00:06' as timestamp),'The development of enterprises, as well as the push of local governments, has boosted the listings, the report said. For example, in early 2018, Central China''s Hubei province launched a plan to double its listed companies by including the listing of enterprises into the government''s annual performance assessment.',cast('1110001010101011001001' as binary),array(6,60,600),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-07-01', col_int='7') select 7,true,7.07,7.007,7.0007,70000000000.0000000007,cast('2020-07-01 00:00:07' as timestamp),'Most listed companies in the top 10 cities come from industries such as information technology, electronics, mechanical equipment, medical biology, and electric power equipment, with information technology bearing most listed companies.',cast('1110001010101011001001' as binary),array(7,70,700),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-07-01', col_int='8') select 7,true,7.07,7.007,7.0007,70000000000.0000000007,cast('2020-07-01 00:00:07' as timestamp),'Most listed companies in the top 10 cities come from industries such as information technology, electronics, mechanical equipment, medical biology, and electric power equipment, with information technology bearing most listed companies.',cast('1110001010101011001001' as binary),array(7,70,700),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-08-01', col_int='8') select 8,false,8.08,8.008,8.0008,80000000000.0000000008,cast('2020-08-01 00:00:08' as timestamp),'Citi report finds MNC mood in China buoyant',cast('1110001010101011001001' as binary),array(8,80,800),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-08-01', col_int='9') select 8,false,8.08,8.008,8.0008,80000000000.0000000008,cast('2020-08-01 00:00:08' as timestamp),'Citi report finds MNC mood in China buoyant',cast('1110001010101011001001' as binary),array(8,80,800),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-09-01', col_int='9') select 9,true,9.09,9.009,9.0009,90000000000.0000000009,cast('2020-09-01 00:00:09' as timestamp),'Xi: China, Russia major champions of multilateralism, global justice',cast('1110001010101011001001' as binary),array(9,90,900),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-09-01', col_int='10') select 9,true,9.09,9.009,9.0009,90000000000.0000000009,cast('2020-09-01 00:00:09' as timestamp),'Xi: China, Russia major champions of multilateralism, global justice',cast('1110001010101011001001' as binary),array(9,90,900),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-10-01', col_int='10') select 10,false,10.10,10.010,10.00010,100000000000.00000000010,cast('2020-10-01 00:00:10' as timestamp),'Volkswagen deliveries hit by chip shortages',cast('1110001010101011001001' as binary),array(10,100,1000),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-10-01', col_int='11') select 10,false,10.10,10.010,10.00010,100000000000.00000000010,cast('2020-10-01 00:00:10' as timestamp),'Volkswagen deliveries hit by chip shortages',cast('1110001010101011001001' as binary),array(10,100,1000),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-11-01', col_int='11') select 11,true,11.11,11.011,11.00011,110000000000.00000000011,cast('2020-11-01 00:00:11' as timestamp),'Light installations featuring the tiger, the Chinese zodiac animal for the upcoming lunar year, will be set up in the main plaza.',cast('1110001010101011001001' as binary),array(11,110,1100),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-11-01', col_int='12') select 11,true,11.11,11.011,11.00011,110000000000.00000000011,cast('2020-11-01 00:00:11' as timestamp),'Light installations featuring the tiger, the Chinese zodiac animal for the upcoming lunar year, will be set up in the main plaza.',cast('1110001010101011001001' as binary),array(11,110,1100),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-12-01', col_int='12') select 12,false,12.12,12.012,12.00012,120000000000.00000000012,cast('2020-10-01 00:00:10' as timestamp),'The annual lantern show at Yuyuan Garden, a historic tourist site in Shanghai, will kick off on Tuesday and run through Feb 28.',cast('1110001010101011001001' as binary),array(12,120,1200),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = '2020-12-01', col_int='13') select 12,false,12.12,12.012,12.00012,120000000000.00000000012,cast('2020-10-01 00:00:10' as timestamp),'The annual lantern show at Yuyuan Garden, a historic tourist site in Shanghai, will kick off on Tuesday and run through Feb 28.',cast('1110001010101011001001' as binary),array(12,120,1200),null,null;
    insert into hudi_par_parquet_gzip partition(col_date = null, col_int=null) select 13,null,null,null,null,null,null,null,null,null,null,null;
    insert into hudi_par_parquet_gzip partition(col_date = null, col_int=null) select 14,null,null,null,null,12345678901234567890.123456789012345678,null,'',null,null,null,null;
  2. create hudi external table:

    create external table ex_hudi_tbl0 (
    uuid int
    ,col_boolean boolean
    ,col_int int
    ,col_long bigint
    ,col_float float
    ,col_double double
    ,col_decimal decimal(38,18)
    ,col_date date
    ,col_string string
    ,col_binary varchar(200))
    ENGINE=hudi
    properties (
    "resource" = "hudi_emr_tn",
    "table" = "hudi_par_parquet_gzip",
    "database" = "hudi_db");
  3. select from hudi external table error:

    select col_boolean,col_int,col_long,col_float,col_double,col_decimal,col_date,col_string,col_binary from ex_hudi_tbl0 order by col_int limit 5
    execute sql error and messages: (1064, 'get partition keys failed: com.starrocks.common.DdlException: Fail to access meta store of Hive. error: date literal [default] is invalid'), and time is: 2022-03-29 20:58:36

Expected behavior (Required)

results from hudi external table same with hudi table:

spark-sql> select col_boolean,col_int,col_long,col_float,col_double,col_decimal,col_date,col_string,col_binary from hudi_par_parquet_gzip order by col_int;
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    12345678901234567890.123456789012345678 NULL        NULL
true    1   1   1.001   1.0001  10000000000.000000000100000000  2020-01-01  Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?   1110001010101011001001
false   2   2   2.002   2.0002  20000000000.000000000200000000  2020-02-01  Xi Jinping, general secretary of the Communist Party of China Central Committee, called on Tuesday for unrelenting efforts in exercising full and rigorous governance over the Party, saying that the CPC will continue to show zero tolerance for corruption.  1110001010101011001001
true    2   1   1.001   1.0001  10000000000.000000000100000000  2020-01-01  Top 10 Unsolved Mysteries of Paleontological Dinosaurs, Did You Know?   1110001010101011001001
true    3   3   3.003   3.0003  30000000000.000000000300000000  2020-03-01  Nation raises caution on overseas packages  1110001010101011001001
false   3   2   2.002   2.0002  20000000000.000000000200000000  2020-02-01  Xi Jinping, general secretary of the Communist Party of China Central Committee, called on Tuesday for unrelenting efforts in exercising full and rigorous governance over the Party, saying that the CPC will continue to show zero tolerance for corruption.  1110001010101011001001
true    4   3   3.003   3.0003  30000000000.000000000300000000  2020-03-01  Nation raises caution on overseas packages  1110001010101011001001
false   4   4   4.004   4.0004  40000000000.000000000400000000  2020-04-01  Shanghai, Shenzhen register most newly listed firms in 2021 1110001010101011001001
false   5   4   4.004   4.0004  40000000000.000000000400000000  2020-04-01  Shanghai, Shenzhen register most newly listed firms in 2021 1110001010101011001001
true    5   5   5.005   5.0005  50000000000.000000000500000000  2020-05-01  A total of 4,685 companies have been listed on the A-share market in China as of Dec 31, 2021, with 46 percent of them based in Beijing, Shanghai, Shenzhen, Hangzhou, Suzhou, Guangzhou, Ningbo, Nanjing, Wuxi, and Chengdu, said a report of National Business Daily on Wednesday.    1110001010101011001001
false   6   6   6.006   6.0006  60000000000.000000000600000000  2020-06-01  The development of enterprises, as well as the push of local governments, has boosted the listings, the report said. For example, in early 2018, Central Chinas Hubei province launched a plan to double its listed companies by including the listing of enterprises into the governments annual performance assessment.   1110001010101011001001
true    6   5   5.005   5.0005  50000000000.000000000500000000  2020-05-01  A total of 4,685 companies have been listed on the A-share market in China as of Dec 31, 2021, with 46 percent of them based in Beijing, Shanghai, Shenzhen, Hangzhou, Suzhou, Guangzhou, Ningbo, Nanjing, Wuxi, and Chengdu, said a report of National Business Daily on Wednesday.    1110001010101011001001
true    7   7   7.007   7.0007  70000000000.000000000700000000  2020-07-01  Most listed companies in the top 10 cities come from industries such as information technology, electronics, mechanical equipment, medical biology, and electric power equipment, with information technology bearing most listed companies.    1110001010101011001001
false   7   6   6.006   6.0006  60000000000.000000000600000000  2020-06-01  The development of enterprises, as well as the push of local governments, has boosted the listings, the report said. For example, in early 2018, Central Chinas Hubei province launched a plan to double its listed companies by including the listing of enterprises into the governments annual performance assessment.   1110001010101011001001
false   8   8   8.008   8.0008  80000000000.000000000800000000  2020-08-01  Citi report finds MNC mood in China buoyant 1110001010101011001001
true    8   7   7.007   7.0007  70000000000.000000000700000000  2020-07-01  Most listed companies in the top 10 cities come from industries such as information technology, electronics, mechanical equipment, medical biology, and electric power equipment, with information technology bearing most listed companies.    1110001010101011001001
false   9   8   8.008   8.0008  80000000000.000000000800000000  2020-08-01  Citi report finds MNC mood in China buoyant 1110001010101011001001
true    9   9   9.009   9.0009  90000000000.000000000900000000  2020-09-01  Xi: China, Russia major champions of multilateralism, global justice    1110001010101011001001
true    10  9   9.009   9.0009  90000000000.000000000900000000  2020-09-01  Xi: China, Russia major champions of multilateralism, global justice    1110001010101011001001
false   10  10  10.01   10.0001 100000000000.000000000100000000 2020-10-01  Volkswagen deliveries hit by chip shortages 1110001010101011001001
false   11  10  10.01   10.0001 100000000000.000000000100000000 2020-10-01  Volkswagen deliveries hit by chip shortages 1110001010101011001001
true    11  11  11.011  11.00011    110000000000.000000000110000000 2020-11-01  Light installations featuring the tiger, the Chinese zodiac animal for the upcoming lunar year, will be set up in the main plaza.   1110001010101011001001
true    12  11  11.011  11.00011    110000000000.000000000110000000 2020-11-01  Light installations featuring the tiger, the Chinese zodiac animal for the upcoming lunar year, will be set up in the main plaza.   1110001010101011001001
false   12  12  12.012  12.00012    120000000000.000000000120000000 2020-12-01  The annual lantern show at Yuyuan Garden, a historic tourist site in Shanghai, will kick off on Tuesday and run through Feb 28. 1110001010101011001001
false   13  12  12.012  12.00012    120000000000.000000000120000000 2020-12-01  The annual lantern show at Yuyuan Garden, a historic tourist site in Shanghai, will kick off on Tuesday and run through Feb 28. 1110001010101011001001
Time taken: 18.986 seconds, Fetched 26 row(s)

Real behavior (Required)

get partition keys failed: com.starrocks.common.DdlException: Fail to access meta store of Hive. error: date literal [default] is invalid

StarRocks version (Required)

miomiocat commented 2 years ago

i will fix it