dglazkov / polymath

MIT License
133 stars 9 forks source link

Switch format to use a sorted list of chunks #56

Open jkomoros opened 1 year ago

jkomoros commented 1 year ago

With #50 we shifted to having a dict of chunks, but also a side-car of keys by sort, which gets a bit wordy.

The original reason for Library's chunks being a dict was quick addressing of content (especially to check, during import, for chunks that already had an embedding and didn't need one recomputed). But if we switched to a new model where the canonical form for chunks was a list and everything had an id on each chunk, then Library could compute a little mapping of id -> chunk live and not have it persisted/serialized.