Closed taiyang-li closed 1 year ago
Can one of the admins verify this patch?
0: jdbc:hive2://localhost:10000/> set spark.gluten.sql.native.parquet.writer.enabled = true;
+-------------------------------------------------+--------+
| key | value |
+-------------------------------------------------+--------+
| spark.gluten.sql.native.parquet.writer.enabled | true |
+-------------------------------------------------+--------+
1 row selected (1.22 seconds)
0: jdbc:hive2://localhost:10000/> CREATE TABLE t(
. . . . . . . . . . . . . . . . > id INT,
. . . . . . . . . . . . . . . . > info STRUCT<name:STRING, age:INT>,
. . . . . . . . . . . . . . . . > data MAP<STRING, INT>,
. . . . . . . . . . . . . . . . > values ARRAY<INT>
. . . . . . . . . . . . . . . . > ) stored as parquet;
+---------+
| Result |
+---------+
+---------+
No rows selected (3.062 seconds)
0: jdbc:hive2://localhost:10000/> INSERT overwrite t VALUES
. . . . . . . . . . . . . . . . > (1, struct('John', 25), map('A', 10, 'B', 20), array(1.0, 2.0, 3.0)),
. . . . . . . . . . . . . . . . > (2, struct('Alice', 30), map('C', 15, 'D', 25), array(4.0, 5.0, 6.0)),
. . . . . . . . . . . . . . . . > (3, struct('Bob', 35), map('E', 12, 'F', 18), array(7.0, 8.0, 9.0)),
. . . . . . . . . . . . . . . . > (4, struct('Jane', 40), map('G', 22, 'H', 30), array(10.0, 11.0, 12.0)),
. . . . . . . . . . . . . . . . > (5, struct('Kate', 45), map('I', 17, 'J', 28), array(13.0, 14.0, 15.0)),
. . . . . . . . . . . . . . . . > (6, null, null, null),
. . . . . . . . . . . . . . . . > (7, struct('Tank', 20), map('X', null, 'Y', null), array(1.0, 2.0, 3.0));
+---------+
| Result |
+---------+
+---------+
No rows selected (3.917 seconds)
0: jdbc:hive2://localhost:10000/> select * from t;
+-----+---------------------------+----------------------+-------------+
| id | info | data | values |
+-----+---------------------------+----------------------+-------------+
| 1 | {"name":null,"age":null} | {"A":10,"B":20} | [1,2,3] |
| 2 | {"name":null,"age":null} | {"C":15,"D":25} | [4,5,6] |
| 5 | {"name":null,"age":null} | {"I":17,"J":28} | [13,14,15] |
| 6 | NULL | NULL | NULL |
| 7 | {"name":null,"age":null} | {"X":null,"Y":null} | [1,2,3] |
| 3 | {"name":null,"age":null} | {"E":12,"F":18} | [7,8,9] |
| 4 | {"name":null,"age":null} | {"G":22,"H":30} | [10,11,12] |
+-----+---------------------------+----------------------+-------------+
7 rows selected (1.32 seconds)
$ cd spark-warehouse/t
$ parquet-tools csv ./part-000*
id,info,data,values
1,"{'1': 'John', '2': 25}","[('A', 10), ('B', 20)]",[1 2 3]
2,"{'1': 'Alice', '2': 30}","[('C', 15), ('D', 25)]",[4 5 6]
3,"{'1': 'Bob', '2': 35}","[('E', 12), ('F', 18)]",[7 8 9]
4,"{'1': 'Jane', '2': 40}","[('G', 22), ('H', 30)]",[10 11 12]
5,"{'1': 'Kate', '2': 45.0}","[('I', 17.0), ('J', 28.0)]",[13 14 15]
6,,,
7,"{'1': 'Tank', '2': 20.0}","[('X', None), ('Y', None)]",[1 2 3]
@binmahone @liuneng1994 @zzcclp Can you review this pr
LGTM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Allow nullable complex types when writing into parquet/orc. close https://github.com/oap-project/gluten/issues/2466