brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

"slice bounds out of range" during attempted type conversion #1152

Closed philrz closed 4 years ago

philrz commented 4 years ago

Found while assisting a community user. Repro is with zq commit 41a4641.

Here I'm attempting to perform lots of type conversion, or as we've been calling it (perhaps incorrectly), "casting".

This works:

$ echo '{"stime":"1598051907.12345678", "saddr":"1.2.3.4" }' | zq -t "put stime=String.parseFloat(stime):time,saddr=saddr:ip" -
#0:record[saddr:ip,stime:time]
0:[1.2.3.4;1598051907.123456716;]

This also works:

$ echo '{"stime":"1598051907.12345678", "ltime":"1498051907.87654321" }' | zq -t "put stime=String.parseFloat(stime):time,ltime=String.parseFloat(ltime):time" -
#0:record[ltime:time,stime:time]
0:[1498051907.876543284;1598051907.123456716;]

However if I attempt to convert all three at once, now I get a panic:

$ echo '{"stime":"1598051907.12345678", "ltime":"1498051907.87654321", "saddr":"1.2.3.4" }' | zq -t "put stime=String.parseFloat(stime):time,ltime=String.parseFloat(ltime):time,saddr=saddr:ip" -
#0:record[ltime:time,saddr:ip,stime:time]
0:[1498051907.876543284;<ZNG-ERR type ip [%!s(PANIC=String method: runtime error: slice bounds out of range [:24] with capacity 22)]: failure trying to decode IP address that is not 4 or 16 bytes long>;1598051907.123456716;<record> (record[ltime:time,saddr:ip,stime:time]): record with extra field
philrz commented 4 years ago

Note to self: I notice that even in the two examples that didn't panic, the order of the fields in the output is reversed from what it was in the original NDJSON & how they appeared in put. I don't know if that's related to the root cause of this issue, but FWIW this re-ordering doesn't happen with other conversions, such as:

$ echo '{"a": 1.0, "b": 2.0}' | zq -t 'put a=a:int32,b=b:int32' -
#0:record[a:int32,b:int32]
0:[1;2;]

If the fix to the originally described issue doesn't affect the ordering, I'll open a separate issue regarding that.

philrz commented 4 years ago

Verified in zq commit cdcf37e.

The final example shown above now assigns the expected data type and no longer produces a panic.

$ echo '{"stime":"1598051907.12345678", "ltime":"1498051907.87654321", "saddr":"1.2.3.4" }' | zq -t "put stime=String.parseFloat(stime):time,ltime=String.parseFloat(ltime):time,saddr=saddr:ip" -
#0:record[ltime:time,saddr:ip,stime:time]
0:[1498051907.876543284;1.2.3.4;1598051907.123456716;]

As for my "note to self" comment, I see that this seems to be a general behavior in no way linked to use of put / functions / "casting". Example:

$ echo '{"b": 1, "a": 2}' | zq -t -
#0:record[a:float64,b:float64]
0:[2;1;]

I've floated this topic separately with the development team, but I'm suspecting this is known behavior, since in a pure JSON world, order is not guaranteed, so it would seem to be fair game if zq reordered these upon input before it commits them to an internal ZNG representation (where order is guaranteed). If a user does want to specify the order in ZNG of NDJSON data read via zq, one way I know they can guarantee this today is to use a configuration like the one described in the Zeek JSON Import article.

Thanks @mccanne!