blevesearch / zapx

Zap file format compatible with a future version of Bleve
Apache License 2.0
11 stars 12 forks source link

zapx file format

The zapx module is fork of zap module which maintains file format compatibility, but removes dependency on bleve, and instead depends only on the indepenent interface modules:

Advanced ZAP File Format Documentation is here.

The file is written in the reverse order that we typically access data. This helps us write in one pass since later sections of the file require file offsets of things we've already written.

Current usage:

stored fields section

stored fields idx

With this index and a known document number, we have direct access to all the stored field data.

posting details (freq/norm) section

If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.

posting details (location) section

If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.

postings list section

dictionary

fields section

fields idx

NOTE: currently we don't know or record the length of this fields index. Instead we rely on the fact that we know it immediately precedes a footer of known size.

fields DocValue

NOTE: currently the meta header inside each chunk gives clue to the location offsets and size of the data pertaining to a given docID and any read operation leverage that meta information to extract the document specific data from the file.

footer