Closed amahussein closed 2 weeks ago
https://github.com/apache/spark/commit/a4fb6cbfda2
This PR affects the from_json operator and at least we need to test the behavior on the plugin.
from_json
SELECT from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').a, from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').A FROM range(3) as t
Earlier, the result would had been:
Array([ArraySeq(0),ArraySeq(null)], [ArraySeq(1),ArraySeq(null)], [ArraySeq(2),ArraySeq(null)])
vs the new result is (verified through spark-shell):
Array([ArraySeq(0),ArraySeq(0)], [ArraySeq(1),ArraySeq(1)], [ArraySeq(2),ArraySeq(2)])
I just looked at this a bit more deeply, and this is a bug in a logical plan optimization in Spark. What is more we don't support top level arrays in from_json yet, so this does not impact us at all.
https://github.com/apache/spark/commit/a4fb6cbfda2
This PR affects the
from_json
operator and at least we need to test the behavior on the plugin.Earlier, the result would had been:
vs the new result is (verified through spark-shell):