Closed cessen closed 5 years ago
How about having a configurable overhead? For example, when you care about speed more than memory consumption you can preallocate a bit more from the beginning. The opposite is also true: you may need to shrink the rope in place after every single mutation to get though strict RAM limits, especially when mutations are rare.
So Rope
could have methods:
This may be far more sophisticated than you would like to have in ropey
due to complexity of this proposal, but I'd be glad if you consider finer-grained ways to control memory consumption.
I think this is definitely a good idea, though I think I'll simplify it to just an overhead()
method that returns the overhead in bytes and a shrink_to_fit()
method. However, that's also orthogonal to what I'm trying to address with this issue, which is to just generally make it more memory efficient by default.
I'll open another issue for your idea. Thanks!
@hcpl
Created issue #6 for capacity()
and shrink_to_fit()
.
Yeah, what I proposed doesn't directly solve the problem outlined in this issue, but there is a chance that adding a sufficiently fine-grained memory management API might hit you on some helpful ideas, which can make solutions to this issue trivial.
Thanks for taking my ideas into account! And having simple API is a good idea, at least until someone requests more.
I made some pretty decent progress on this, in the form of across-the-board memory overhead reduction. The range is now between 17% (for freshly loaded files) and 60% (for texts built from small random insertions).
The 17% number is really nice, and I have no problems with that. But while 60% is certainly better than before, I would still like to try to reduce it further if possible. I'll update the issue with the new numbers, and hopefully even more progress can be made.
It's cool to know about the recent progress you've made! I guess the 60% overhead can be decreased through some more overhead that is within the 60% bounds? Though I don't know the design of this library thoroughly, so I might be wrong. Good luck!
And Happy New Year :christmas_tree:
Thanks! Happy New Year to you, too! :-)
I guess the 60% overhead can be decreased through some more overhead that is within the 60% bounds?
Hmm, I'm not sure I understand what you mean. Can you go into a bit more detail?
Though I don't know the design of this library thoroughly, so I might be wrong.
I just started documenting Ropey's design in design.md
in the root of this repo, so that might help a bit. But there is still a fair bit missing from it... including, I think, some things relevant to this issue. But maybe it will help.
@hcpl The parts of the design document relevant to this issue are now done (modulo editing for clarity, etc.): https://github.com/cessen/ropey/blob/master/design/design.md
Some more progress:
Thanks to improvements in scanning performance (i.e. for character count and line break count), I was able to bump the max leaf size to 1024 bytes. This reduces the number of internal nodes of the b-tree, and likely helped a bit with fragmentation as well. For ropes freshly created from existing text, this reduced the memory overhead from ~17% to ~10%.
I'm pretty happy with where Ropey's memory overhead is at this point, so I'm closing. But future improvements are still, of course, welcome and desired!
(This is a tracking issue for reducing the memory overhead of Ropey. The stats listed below are updated as improvements are made.)
The memory overhead for creating new ropes from existing text is pretty good (~10%). However, the memory overhead of building up a rope from many small insertions is worse (up to ~55% depending on insertion placement).
Improving this would be great!
The main culprit here is that since leaf nodes generally have a fixed-size text buffer, leaf nodes that aren't maximally filled contain wasted space. Trying to keep leaf nodes close to maximally filled should resolve the memory overhead back down to close to 10%. But how to do that without negatively impacting performance too much?