Open philrz opened 1 year ago
I spent sometime looking into this, if I distill this particular issue to its essence you get:
$ echo '{x:null} {x:1.} "hello"' | zq -z 'fuse | count() by typeof(this)' -
{typeof:<{x:null}>,count:1(uint64)}
{typeof:<(string,{x:float64})>,count:2(uint64)}
Basically the fuse op constructs what the code calls an uberSchema (probably should be called uberType) of all the types it sees and then applies a shaper for all values it sees to the resultant uberType. In this case fuse correctly creates the expected uberType (string,{x:float64})
. The problem in this is issue is really with shape
.
If I do
$ echo '{x:null}' | zq 'shape(this, <(string,{x:float64})>)' -
we confusingly get {x:null}
as the response. The reason being is that currently the shaper when shaping values into union types, will only shape the value if there is an exact type match in the union. This is how we are getting two different types out of fuse in the original example.
So I can fix what shape does in this case and make it be more dynamic but this made me think more deeply about the use case of shaping values to union types. Shaping to (string,{x:float64})
is easy enough but what if you are shaping value {x:null}
to something like ({x:float64},{x:string})
? What should shape do? Pick the first record? Since shaper is good at shaping any record to any record type I don't think shaper should be put in this position. I think I'm inclined to say that if shaper encounters a union with two record values it should return an error (something like ambiguous target type value
). Same would go for the two array, map and set values. Come to think of it I don't think it should handle a union with a record and map value as that also seems weird.
Anyways, before going in and fixing the above simple union case it would be nice to have it figured out shaper should behave given more complicated union cases.
Repro is with Zed commit 06f7936. This problem was spotted while researching zui/2751 and uses some of its test data to repro.
Consider these individual test data values in three separate files.
First we have a simple record in
record.zson
.Next is another record in
has-nulls.zson
that differs in its type becuase it hasnull
values for two fields.Finally we have a
string
value instring.zson
.I can successfully
fuse
any pairing of the three values to create a single unified type.However (here comes the bug) when I try to
fuse
all three values at once, I still end up with two types, one being a union that includesstring
and the other doesn't.