dglazkov / polymath

MIT License
132 stars 9 forks source link

Consider removing the pkl format option #24

Closed dglazkov closed 1 year ago

dglazkov commented 1 year ago
jkomoros commented 1 year ago

For my export of ~6k cards of content, the pkl is 107MB and the json is 304MB.

That's a significant enough difference that maybe it is worth it to keep it around, especially since it's not too much overhead to maintain.

Perhaps we just make json the default format (since it's easier to tweak and inspect) and document when you might use .pkl?

dglazkov commented 1 year ago

Crazy idea: encode embeddings the way they are encoded in a query: base64 float array. I wonder how much savings that would offer?

jkomoros commented 1 year ago

9b4f0543207c4e65eb77f9751efbe285080a5409 and c8002308e8875934eee6c58a08d2cbb66d325171 were part of moving to b64, and saving tons of space

jkomoros commented 1 year ago