apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.5k stars 1.02k forks source link

simplify_expressions return expression with wrong type. #6596

Open jackwener opened 1 year ago

jackwener commented 1 year ago

Describe the bug

Some expression after simplify_expressions return different type like.

Internal error: Optimizer rule 'simplify_expressions' failed, due to generate a different schema, original schema: DFSchema { fields: [DFField { qualifier: None, field: Field { name: "array_fill(Int64(11),make_array(Int64(1),Int64(2),Int64(3)))", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "array_fill(Int64(3),make_array(Int64(2),Int64(3)))", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "array_fill(Int64(2),make_array(Int64(2)))", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }, new schema: DFSchema { fields: [DFField { qualifier: None, field: Field { name: "array_fill(Int64(11),make_array(Int64(1),Int64(2),Int64(3)))", data_type: List(Field { name: "item", data_type: List(Field { name: "item", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "array_fill(Int64(3),make_array(Int64(2),Int64(3)))", data_type: List(Field { name: "item", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "array_fill(Int64(2),make_array(Int64(2)))", data_type: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [SQL] select array_fill(11, make_array(1, 2, 3)), array_fill(3, make_array(2, 3)), array_fill(2, make_array(2));

To Reproduce

6595

cargo test -p datafusion --test sqllogictests

Expected behavior

No response

Additional context

No response

jackwener commented 1 year ago
SELECT DATE_TRUNC('MINUTE', TIMESTAMP '2022-08-03 14:38:50Z');

"Projection: date_trunc(Utf8("MINUTE"), CAST(Utf8("2022-08-03 14:38:50Z") AS Timestamp(Nanosecond, None)))
  EmptyRelation"
-->
"Projection: TimestampSecond(1659537480, None) AS date_trunc(Utf8("MINUTE"),Utf8("2022-08-03 14:38:50Z"))
  EmptyRelation"

expr type from timestamp nanosecond -> timestamp second.

jackwener commented 1 year ago

I already fix bug about type conversion like aboving.

We need investigate the remaining problem related with simplify expression about arrays

izveigor commented 1 year ago

@jackwener As I understand the main problem is in the nullable parameter. I think the solution to the problem is https://github.com/apache/arrow-datafusion/issues/6556

alamb commented 1 year ago

FWIW the date_trunc issue from https://github.com/apache/arrow-datafusion/issues/6596#issuecomment-1582112533 is tracked in https://github.com/apache/arrow-datafusion/issues/6623