facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.53k stars 1.16k forks source link

insert into select fails with "Field not found" error #5040

Open yingsu00 opened 1 year ago

yingsu00 commented 1 year ago

Bug description

First start HiveExternalQueryRunner, create an empty DWRF table

 CREATE TABLE hive.tpch1_dwrf.lineitem2 (
    "orderkey" bigint,
    "partkey" bigint,
    "suppkey" bigint,
    "linenumber" integer,
    "quantity" double,
    "extendedprice" double,
    "discount" double,
    "tax" double,
    "returnflag" varchar(1),
    "linestatus" varchar(1),
    "shipdate" timestamp,
    "commitdate" timestamp,
    "receiptdate" timestamp,
    "shipinstruct" varchar(25),
    "shipmode" varchar(10),
    "comment" varchar(44)
 )
 WITH (
    format = 'DWRF'
 )

Then insert into it from an existing table lineitem

presto:tpch1_dwrf> insert into lineitem2 select * from lineitem\G;
Query 20230525_031609_00001_t4fbf failed:  Field not found: min. Available fields are: rows, fragments, commitcontext.
VeloxUserError:  Field not found: min. Available fields are: rows, fragments, commitcontext.

presto:tpch1_dwrf> explain insert into lineitem2 select * from lineitem
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
Query Plan | - Output[rows] => [rows_137:bigint]
           |         rows := rows_137
           |     - TableCommit[Optional[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch1_dwrf, tableName=lineitem2, analyzePartitionValues=Optional.empty}', layout='Optional.empty'}]] => [rows_137:bigint]
           |         - RemoteStreamingExchange[GATHER] => [rows:bigint, fragments:varbinary, commitcontext:varbinary, min:bigint, max:bigint, approx_distinct:varbinary, count:bigint, min_79:bigint, max_80:bigint, approx_distinct_81:varbinary, count_82:bigint, min_83:bigint, max_84:bigint, approx_distinct_85:varbinary, count_86:bigint, min_87:integer, max_88:integer, approx_distinct_89:varbinary, count_90:bigint, min_91:d>
           |             - TableWriter => [rows:bigint, fragments:varbinary, commitcontext:varbinary, min:bigint, max:bigint, approx_distinct:varbinary, count:bigint, min_79:bigint, max_80:bigint, approx_distinct_81:varbinary, count_82:bigint, min_83:bigint, max_84:bigint, approx_distinct_85:varbinary, count_86:bigint, min_87:integer, max_88:integer, approx_distinct_89:varbinary, count_90:bigint, min_91:double, max_92:do>
           |                     orderkey := orderkey (1:31)
           |                     partkey := partkey (1:31)
           |                     suppkey := suppkey (1:31)
           |                     linenumber := linenumber (1:31)
           |                     quantity := quantity (1:31)
           |                     extendedprice := extendedprice (1:31)
           |                     discount := discount (1:31)
           |                     tax := tax (1:31)
           |                     returnflag := returnflag (1:31)
           |                     linestatus := linestatus (1:31)
           |                     shipdate := shipdate (1:31)
           |                     commitdate := commitdate (1:31)
           |                     receiptdate := receiptdate (1:31)
           |                     shipinstruct := shipinstruct (1:31)
           |                     shipmode := shipmode (1:31)
           |                     comment := comment (1:31)
           |                     Statistics collected: 64
           |                 - RemoteStreamingExchange[REPARTITION] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), linestatus:varchar(1), shipdate:timestamp, commitdate:timestamp, receiptdate:timestamp, shipinstruct:varchar(25), shipmode:varchar(10), comment:varchar(44)]
           |                         Estimates: {rows: 6001215 (51.51MB), cpu: 1977737764.00, memory: 0.00, network: 988868882.00}
           |                     - TableScan[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch1_dwrf, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout='Optional[tpch1_dwrf.lineitem{}]'}] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), linestatus:varchar(1),>
           |                             Estimates: {rows: 6001215 (51.51MB), cpu: 988868882.00, memory: 0.00, network: 0.00}
           |                             LAYOUT: tpch1_dwrf.lineitem{}
           |                             linenumber := linenumber:int:3:REGULAR (1:45)
           |                             receiptdate := receiptdate:timestamp:12:REGULAR (1:45)
           |                             quantity := quantity:double:4:REGULAR (1:45)
           |                             orderkey := orderkey:bigint:0:REGULAR (1:45)
           |                             shipinstruct := shipinstruct:varchar(25):13:REGULAR (1:45)
           |                             suppkey := suppkey:bigint:2:REGULAR (1:45)
           |                             commitdate := commitdate:timestamp:11:REGULAR (1:45)
           |                             returnflag := returnflag:varchar(1):8:REGULAR (1:45)
           |                             shipmode := shipmode:varchar(10):14:REGULAR (1:45)
           |                             tax := tax:double:7:REGULAR (1:45)
           |                             extendedprice := extendedprice:double:5:REGULAR (1:45)
           |                             shipdate := shipdate:timestamp:10:REGULAR (1:45)
           |                             comment := comment:varchar(44):15:REGULAR (1:45)
           |                             partkey := partkey:bigint:1:REGULAR (1:45)
           |                             discount := discount:double:6:REGULAR (1:45)
           |                             linestatus := linestatus:varchar(1):9:REGULAR (1:45)
           |

System information

N/A I think this is irrelevant

Relevant logs

No response

yingsu00 commented 1 year ago

@Yuhta Jimmy, is INSERT INTO SELECT supported or not?

Yuhta commented 1 year ago

@yingsu00 It is supported. The schema in TableWriter is weird here:

TableWriter => [rows:bigint, fragments:varbinary, commitcontext:varbinary, min:bigint, max:bigint, approx_distinct:varbinary, count:bigint, min_79:bigint, max_80:bigint, approx_distinct_81:varbinary, count_82:bigint, min_83:bigint, max_84:bigint, approx_distinct_85:varbinary, count_86:bigint, min_87:integer, max_88:integer, approx_distinct_89:varbinary, count_90:bigint, min_91:double, max_92:do>
yingsu00 commented 1 year ago

@Yuhta I think they are the stats columns

Yuhta commented 1 year ago

@kewang1024 @xiaoxmeng Do we see similar thing in our queries?

kewang1024 commented 1 year ago

It is highly due to a missing feature: column stats collection, I'm currently working on adding this support. set session hive.collect_column_statistics_on_write=true; to disable the feature and try the query again