Bundle storage could be a lot more efficient. Blobs that haven't changed between commits are reused, so that is fairly efficient. One could imagine some kind of line-by-line diffing like git does, but that's not even the lowest-hanging fruit: Each commit generates a new uuid for every field of every row in the database. Even if the commit is only changing a few rows, an entire rowset is regenerated. Turns out for some very large bundles with a large number of commits (like the IDE), these rowset_row_field records are taking up more space than the contents of the bundle:
This could potentially be addressed with differential commits, ones that only generate new rowset_row and rowset_row_field records for values that the commit actually changed. It would also considerably increase the complexity (and potentially fragility) of the commit structure. We would also incur a performance hit potentially, since the real contents of the commit would have to be reconstructed recursively by traversing the commit ancestry. It's a lot less clean and a lot more complex. But would result in a big increase in storage efficiency. Discuss.
Bundle storage could be a lot more efficient. Blobs that haven't changed between commits are reused, so that is fairly efficient. One could imagine some kind of line-by-line diffing like git does, but that's not even the lowest-hanging fruit: Each commit generates a new uuid for every field of every row in the database. Even if the commit is only changing a few rows, an entire rowset is regenerated. Turns out for some very large bundles with a large number of commits (like the IDE), these
rowset_row_field
records are taking up more space than the contents of the bundle:This could potentially be addressed with differential commits, ones that only generate new
rowset_row
androwset_row_field
records for values that the commit actually changed. It would also considerably increase the complexity (and potentially fragility) of the commit structure. We would also incur a performance hit potentially, since the real contents of the commit would have to be reconstructed recursively by traversing the commit ancestry. It's a lot less clean and a lot more complex. But would result in a big increase in storage efficiency. Discuss.