brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

zar import target size ignores compression #1062

Closed alfred-landrum closed 4 years ago

alfred-landrum commented 4 years ago

The -s target size option of zar import uses the uncompressed size of the imported data instead of the compressed size, so the size of the resulting chunks is ~1/2 the target size.

philrz commented 4 years ago

Verified in zq commit 66ddf34.

Revisiting the symptom at zq commit 137901c right before this fix, performing the operation shown in the zar README:

$ echo $ZAR_ROOT
/Users/phil/logs
$ zq zng/*.gz | zar import -s 25MB -
$ tree -s logs
logs
├── [        288]  20180324
│   ├── [   10598504]  1521911841.543641.zng
│   ├── [   11261490]  1521911975.777469.zng
│   ├── [   11368389]  1521912152.518493.zng
│   ├── [   11401209]  1521912335.72784.zng
│   ├── [   11791579]  1521912549.366398.zng
│   ├── [   11442168]  1521912792.328806.zng
│   └── [   11558761]  1521912990.158766.zng
└── [       1007]  zar.json

1 directory, 8 files

Now at zq commit 66ddf34, we see the requested chunk size:

$ zq zng/*.gz | zar import -s 25MB -
$ tree -s logs
logs
├── [        192]  20180324
│   ├── [    4425007]  1521911772.980384.zng
│   ├── [   25001925]  1521912075.114273.zng
│   ├── [   25007413]  1521912507.399929.zng
│   └── [   25005195]  1521912990.158766.zng
└── [        632]  zar.json

1 directory, 5 files

Thanks @alfred-landrum!