Open philrz opened 8 months ago
collect()
would preserve these values, but then we discussed it as a team and have decided to hold off on merging anything yet. Some things that were touched on in the discussion:There was consensus that if we change this for collect()
we should do the same for union()
so they're consistent.
One reason for hesitation on merging #5041 is that this change is significant enough that it might break some programs, e.g., those of a particular community zync user that has some particularly large/complex Zed that relies heavily on collect()
at times. If Zed were more mature it would probably be appropriate to do a careful/slow transition, e.g., maintain the current behavior with some kind of flag that becomes a new default, then surface a warning to users whose programs are relying on that behavior because we intend to swap the default in a future version, since this would give them time to understand and react to the change. Instead we discussed helping them to audit their programs for possible exposure and also giving a heads-up to the wider community via Slack before switching the behavior. But then we identified other behaviors with aggregate functions we might want to consider, so we concluded that we should gather up our thoughts on all those changes and then figure out a transition plan.
As an example of "other behaviors", @mccanne cited avg()
as an example, since here null
values are currently treated as "not present", whereas if other aggregate functions like collect()
and union()
are changed to start preserving the null
values, users might now expect a different behavior from avg()
too, e.g., treating them as "present" and implying some meaningful value such as 0
, which would change the result.
$ zq -version
Version: v1.14.0-2-g4dc62369
$ echo '1 3 5 7' | zq -z 'avg(this)' -
4.
$ echo '1 null 3 5 7' | zq -z 'avg(this)' -
4.
where
clause, such as can be seen with his branch from #5041:$ zq -version
Version: v1.14.0-3-ga33df054
$ echo '1 2 null 4' | zq -z 'collect(this)' -
[1,2,null,4]([(int64,null)])
$ echo '1 2 null 4' | zq -z 'collect(this) where this != null' -
[1,2,4]
Repro is with Zed commit 4dc6236.
While helping a community Slack user I bumped into the following surprising result:
I would have expected the
null
value to still be in the array created.Presumably due to the same effect, here
collect()
is not even returning an array.union()
has similar behaviors.If we decide the current behavior is intentional and something we intend to keep, I'd suggest we start mentioning it in the docs with an explanation.