brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

Significant performance drop when reading alpha ZNG with newer zq #1414

Closed philrz closed 4 years ago

philrz commented 4 years ago

I noticed that with the merge of #1375, for the combination of "newest zq reading older alpha ZNG format", the performance is quite poor poor. Since at this point I know the support for the alpha ZNG format is probably only important for as long as it takes for users to migrate their data, it's probably not worth spending a lot of time optimizing performance. However, the drop is significant enough that it has the potential to make migrations take a very long time, so I'm flagging it should we want to check if we made an easy mistake.

For the permutations below, I'm using the binary ZNG test data from the zq-sample-data before and after it was updated to the new ZNG format via https://github.com/brimsec/zq-sample-data/pull/16 (which was merged right as #1375 was merged on the zq side).

Here's things performing well with "alpha ZNG & old zq":

~/work/zq-sample-data/zng$ git branch
* (HEAD detached at d156014)
  master

~/work/zq-sample-data/zng$ zq -version
Version: v0.22.0-18-ga80c575

~/work/zq-sample-data/zng$ time (zq -t 'count()' *)
#0:record[count:uint64]
0:[1462078;]

real    0m2.623s
user    0m2.715s
sys 0m0.072s

Here's things performing poorly with "alpha ZNG & new zq":

~/work/zq-sample-data/zng git branch
* (HEAD detached at d156014)
  master

~/work/zq-sample-data/zng$ zq -version
Version: v0.22.0-19-g56ed80d

~/work/zq-sample-data/zng$ time (zq -t 'count()' *)
#0:record[count:uint64]
0:[1462078;]

real    1m50.024s
user    2m11.730s
sys 0m42.685s

Here's things performing well again with "new ZNG & new zq"):

~/work/zq-sample-data/zng$ git branch
* (HEAD detached at cc5cc64)
  master

~/work/zq-sample-data/zng$ zq -version
Version: v0.22.0-19-g56ed80d

~/work/zq-sample-data/zng$ time (zq -t 'count()' *)
#0:record[count:uint64]
0:[1462078;]

real    0m2.496s
user    0m2.600s
sys 0m0.064s
mccanne commented 4 years ago

Slowness is a feature not a bug. It will motivate people to upgrade their zng files from alpha to beta format. Haha.

mccanne commented 4 years ago

Seriously, that's pretty slow. Maybe there is a buffer that needs to be inserted. I will take a look.

mccanne commented 4 years ago

Ok, I tried some buffering tricks that didn't make any difference and did this measurement:

% time zq -f ndjson new.zng | zq -i ndjson - > /dev/null
real    0m49.561s
...
% time zq -i azng -o /dev/null old.zng
warning: old.zng: converting from alpha zng to beta zng (slow)
warning: update zng by running 'zq -o new.zng old.zng'
real    0m51.164s
...

This says azng is taking about the same time as ndjson to take a trip to string-land and back.

philrz commented 4 years ago

Thanks for confirming @mccanne. I'll close this one and add a comment to #1376 regarding the expected performance delta.