frankmcsherry / blog

Some notes on things I find interesting and important.
1.97k stars 177 forks source link

Recycler in 'Memory Management for Big Data' #40

Open ghost opened 5 years ago

ghost commented 5 years ago

At the end of your excellent blog on 'Memory Management for Big Data' (https://github.com/frankmcsherry/blog/blob/master/posts/2017-07-27.md) you mentioned using your recycler project.

I shared your blog with some colleagues, and their feedback on this point was:

"I also followed the link to have a look at ‘recycler’. It seems like a small Rust shortcoming that it’s worth having a library whose main purpose is to provide typed memory caching. I would think (hope?) that the Rust allocator would be clever enough (in terms of releasing/acquiring memory to/from the OS) to make this a fairly limited optimisation. I notice the ‘recycler’ code is fairly old and wonder if this is still a worthwhile procedure?"

and:

"Yeah 'recycler' looks like another memory allocator ... I wonder why that would be useful."

If you have a moment to elaborate on how the recycler helps here, would be very grateful. Thanks!

frankmcsherry commented 5 years ago

Eh, it's a bit hard to comment without knowing the folks. My initial inclination was to go vicious, but I got that under control. ;)

Generally, the more you need tools to work around, e.g. the allocator, the better the language. You don't need to work around the allocator in a slow interpreted language, because it isn't the bottleneck. Both jemalloc and the system allocator (what Rust uses) are decent bits of engineering, and it speaks to Rust's performance that you can still get 2x by going around them. If one's program doesn't need that because it does enough other things, no worries, but lean enough computations do benefit.

I'm not sure what the age of the code has to do with anything. I don't think it slows down each year or anything like that. It could probably be spruced up with some more types, but .. it still does today what it did years back.

Not sure if this clears anything up, or just causes trouble by sounding sassy. :)

ghost commented 5 years ago

Ha. Sorry, I probably shouldn't cut and paste people's comments without any context... this was in the context of me saying (without having looked) that I thought that the recycler was 'a bit like a cursor'.

Is it right that the performance improvement with recycler is down to the reuse of allocated memory? Is there some subtlety around the allocated memory being owned? or is it just a case of pre-allocated is faster that allocating fresh? (sorry if that's a silly question, I am new to Rust).

[edit] it is a stupid question isn't it? it can be reused because its owned right? I'll get my coat.

frankmcsherry commented 5 years ago

It is just about the re-use of memory, yes. And Rust is picky about the memory being owned, and being typed. In this case it was interesting to me that you can do this sort of memory pooling based on the structure of the type (pooling allocations deeper in the type signature, rather than just String or Vec<usize>).

It is also a case that de-allocation by stashing is faster than using the allocator. You'll often find that allocating is relatively cheap (most operating systems do "demand paging", in which they only do work as you start to use the memory, but when you hand them back lots of memory they seem to want to freak out and do a bunch of work (jemalloc, at least)).

ghost commented 5 years ago

Ok that makes lots of sense. Thank you!