brimdata / super

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.39k stars 64 forks source link

Use data file names to hold span information #1262

Closed alfred-landrum closed 4 years ago

alfred-landrum commented 4 years ago

Per #1183 , encode a data files time span in its name, use that info when selecting files to read for a query . As part of this work, we should also stop storing span/index information in the zar.json metadata file.

philrz commented 4 years ago

Verified in zar commit d8f8855.

In terms of what a user would see differently, the key difference is that ZNG files created by the import process will have unique names for every import action. Using the import example from the zar README:

$ rm -rf $ZAR_ROOT ; zq zng/*.gz | zar import -s 25MB -
$ tree -s logs
logs
├── [         88]  zar.json
└── [         96]  zd
    └── [        320]  20180324
        ├── [   25002258]  d-1hyC2rr6TwJIwflUMUAt0AZnXAy.zng
        ├── [   25005739]  d-1hyC2sIVetRsvVQ66nDukNc6R6o.zng
        ├── [   25004149]  d-1hyC2tb584tszcEhLjMO2rGFdmO.zng
        ├── [    5047321]  d-1hyC2whHpj6XZ4z3jf5qPvNsLCZ.zng
        ├── [       1493]  ts-1hyC2rr6TwJIwflUMUAt0AZnXAy-443503-1521912512587283000-1521912080565725000.zng
        ├── [       1554]  ts-1hyC2sIVetRsvVQ66nDukNc6R6o-456922-1521912990158766000-1521912512587864000.zng
        ├── [       1588]  ts-1hyC2tb584tszcEhLjMO2rGFdmO-470614-1521912080565723000-1521911778227216000.zng
        └── [        386]  ts-1hyC2whHpj6XZ4z3jf5qPvNsLCZ-91039-1521911778225955000-1521911720600725000.zng

2 directories, 9 files

$ rm -rf $ZAR_ROOT ; zq zng/*.gz | zar import -s 25MB -
$ tree -s logs
logs
├── [         88]  zar.json
└── [         96]  zd
    └── [        320]  20180324
        ├── [   25005739]  d-1hyCAXRGJbEdQiLoRjn0ogb17dx.zng
        ├── [   25002258]  d-1hyCAZLsx9BnpbohERv8nTeKI1J.zng
        ├── [   25004149]  d-1hyCAc9fsvmLa1tSQawVoO8G9GP.zng
        ├── [    5047321]  d-1hyCAcaF3nzGXY9zUwlJNR7MgRf.zng
        ├── [       1554]  ts-1hyCAXRGJbEdQiLoRjn0ogb17dx-456922-1521912990158766000-1521912512587864000.zng
        ├── [       1493]  ts-1hyCAZLsx9BnpbohERv8nTeKI1J-443503-1521912512587283000-1521912080565725000.zng
        ├── [       1588]  ts-1hyCAc9fsvmLa1tSQawVoO8G9GP-470614-1521912080565723000-1521911778227216000.zng
        └── [        386]  ts-1hyCAcaF3nzGXY9zUwlJNR7MgRf-91039-1521911778225955000-1521911720600725000.zng

This effectively breaks the zar README, so for now anyone working with zar is advised to stick with the GA release tagged v0.21.0. We've got #1360 open as a reminder to circle back and fix the README.