Document format on storage [document.db]

The format of the original document when stored have a bigger performance issue. At the moment, we don't optimize anything, the document value is stored as a []byte JSON string. In that manner, we need do json.Unmarshal every time we need the document content, and this operation can be very expensive.

https://github.com/NeowayLabs/neosearch/blob/master/index/index.go#L168

Some ideas that we can try:

Use the gob package to store the document as native golang binary. Much like write(myStruct, sizeof(myStruct)) in C.
- But ... If the document is a map[string]interface we could have more performance problems:
- https://groups.google.com/d/msg/golang-nuts/12qhqiG1J70/BI3EP_HitM0J
- Serializer benchmarks: https://github.com/alecthomas/go_serialization_benchmarks
Store the fields separately.
- key=1.id, value=1
- key=1.name, value=Plan9 Operating System
- key=1.authors.0, value=Ken Thompson
- key=1.authors.1, value=Dennis Rithie
- and so on for 2.id, 2.name, etc...
Others?

I really like the second option because only in rare cases the user will ask for all of the document fields. If we add the requirement of user need ask only the fields he want in the API, then (maybe) we can benefit a lot in performance. If the document is bigger, one seek in the disk can be much slower than N seeks for specific fields. But for the inverse, for small documents, we can lost some performance too...

NeowayLabs / neosearch

Document format on storage [document.db] #1