Custom derive for deep_size_of

matklad commented 5 years ago

It might be useful to understand how many bytes in memory a particular datastructure occupies. If doesn't have any heap-allocated storage, figuring out the answer is easy: it's just std::mem::size_of.

However for data structures which include vectors, maps, boxes, etc, size_of will account only for (comparatively insignificant) stack part of the storage.

One approach to the problem is servo/heapsize, which communicates with the allocator to learn about which blocks of memory are allocated. This is a relatively precise approach, because it accounts for allocator's own overheads, but it requires the use of jemalloc.

A potentially more robust but less precise approach is to walk the datastructure recursively (that is, iterate all the vectors, hash maps, etc) and sum size_of. Looks like this can be done with a proc macro which would custom-derive the following trait:

trait DeepSizeOf {
    /// Returns an estimation of a total size of memory owned by the object,
    /// including heap-managed storage.
    /// This is an estimation and not a precise result, because it 
    /// doesn't account for allocator's overhead. 
    fn deep_size_of(&self) -> usize;
}

dtolnay commented 5 years ago

Thanks! It sounds like this would be almost exactly like the heapsize / heapsize_derive crates except that their heap_size_of helper would be replaced with a plain mem::size_of that does not require calling into jemalloc.

Do you know of any other differences that would be required compared to heapsize?

matklad commented 5 years ago

There are a couple of design options indeed!

It's not clear what is the best behavior for Arcs and Rcs (to avoid double-counting). One option is to thread a Context argument which would remember a set of visited pointers (thus making the derive a true DFS of a DAG, and not a plain tree traversal)

It might be fun to add a Serialize -> DeepSizeOf bridge: this hopefully gives at least somewhat reasonable approximation, and there are a lot of Serialize impls out there!

dtolnay commented 5 years ago

Regarding the reference counted pointers, let's make that decision up front: for your use case would you prefer to have Arc and Rc accounted accurately at the expense of whatever performance it costs to maintain the context?

matklad commented 5 years ago

I think accurate counts are much more valuable.

novacrazy commented 5 years ago

How about three methods for:

stack size (basically mem::size_of)
heap size for the current structure only
- (capacity * size_of member type)
a recursive heap size that actively scans through all members for additional heap allocations. Expensive but accurate.

Unfortunately, although it’s a brilliant idea at first glance, a fake serializer wouldn’t be wholly sufficient. Flattened types, skipped values, different enum tag representations, etc., can all give incorrect results.

However, it may be close enough to still be worth investigating.

Also, for reference counted pointers. We could divide their size by the total number of strong references for a “portion shared” size?

matklad commented 5 years ago

Hm, not sure: 3 shouldn't be significantly slower than 2 for any case, and it may be exponentially faster if there's many sharing. 1 seems redundant with mem::size_of.

Also, for reference counted pointers. We could divide their size by the total number of strong references for a “portion shared” size?

Interesting idea! Will it always give an accurate count?

novacrazy commented 5 years ago

I think 3 being slower is fine, like a "last resort" full scan operation to be as accurate as possible. That seems required if we can't hook into the allocator itself. 2 and 3 would be equivalent if there is no nesting, but 3 is required for nesting.

As for the accuracy of the reference counts, it's as accurate as an atomic integer load for Arc, and a plain integer for Rc.

Perhaps it could be a crate feature as to whether it should count an Arc/Rc as a whole value or partial value? That way user's could opt-into less/more accurate representations depending on their needs. Adding that crate feature to your end project will also apply it to the dependencies, so they don't even have to care about it.

Aeledfyr commented 5 years ago

The question about double counting Arcs/Rcs also applies to normal references, because a struct can have multiple references to the same object.

struct Example<'a>(&'a u32, &'a u32);

fn main() {
    let number = &42;
    let example = Example(number, number);
    let size = example.deep_size_of();
    // What would size be? 
}

Assuming 64 bit, should the size be 18 or 22? 18 makes the most sense, but that means that you would have to track every reference, as well as the decision for tracking Arcs / Rcs.

I've been working on a basic implementation of the trait and default implementations, but I haven't implemented anything for the references.

novacrazy commented 5 years ago

I think we can only ever hope to count owned or partially owned data, so in that case only the size of the reference/pointer itself should be counted. References or pointers can use the stack, heap, or even other esoteric things, which are all presumably handled elsewhere.

A good example of this is with Cow, where the implementation would recursively count only the owned variant.

Aeledfyr commented 5 years ago

It might be good to have two options (maybe a compile time feature), one for only counting owned data (Vecs, Boxes, etc), and the other for attempting to count partially owned data (Arcs, Rcs, references). In many cases knowing the size of things inside an Arc would still be useful, as long as you know that it might not be completely accurate.

On the implementation side, one way of handling the simpler "Reference" cases would be to store a list of raw pointers to the inner data of each object, and to compare any new "reference" object with that list to see if it has already been counted. There may be some safety issues with getting the raw pointers, but that would take care of double counting in most cases (excluding self referential structs). It would count any partially owned object exactly once, which (I think) would make intuitive sense in most cases.

Aeledfyr commented 5 years ago

My current work is here, but I'm not sure if it's currently good enough to publish to crates.io / add to this list. This is my first "real" crate, so if anyone has code style suggestions or other comments, please tell me.

Manishearth commented 5 years ago

Honestly you could probably make jemalloc an optional dep of heapsize and just use that. Or fork it.

An interesting addition would be something that helps construct a nested tree representation for reporting.

novacrazy commented 5 years ago

jemalloc isn't an option on some targets, most notably MSVC.

I may work on this myself soon. The idea of a fake serializer wouldn't work, but it inspired an idea of arbitrary "Counter" implementations to handle the different considerations described here, such as references. Perhaps even one that does use jemalloc on targets that support it.

Additionally, this absolutely needs a serde-like attribute system for deep-counting foreign types.

Aeledfyr commented 5 years ago

Can someone read through my implementation to see if anything is obviously wrong? I've finished up the basic implementation, and as far as I can tell it works in most cases. I know that there are still some things to add, but it should work as a basic framework.

https://github.com/Aeledfyr/deepsize

Currently it has special cases for each type of allocated type, which makes it so that it doesn't have to use jemalloc, but that opens up opportunities for incorrect guesses. If anyone knows a way of accurately verifying the byte counts, that would be quite useful for the testing.

Also, does the structure of the trait work? It would be good to reduce the number of methods, but you would at least need deep_size_of, deep_size_of_children, and either recurse_deep_size_of or stack_size, and I'm not sure what the best route would be.

dtolnay commented 5 years ago

Let me know if you'd like any design changes @matklad. Once you're happy and the result is published to crates.io, we can close out this issue and cross this off the list. Thanks @Aeledfyr!

Aeledfyr commented 5 years ago

Does anyone have preferences on which standard library types this is implemented for? Also, how would I add integration for some of the dynamic storage crates (such as slotmap), without making it a dependency (only using the impl if the crate is already being used)? I'm not sure if that can actually be done, or how it is normally handled.

dtolnay commented 5 years ago

I filed some issues on your issue tracker and @matklad will do the same if they have feedback. I think we are ready to close out this issue. Thanks and nicely done!

dtolnay / request-for-implementation

Custom derive for deep_size_of #22