bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.47k stars 300 forks source link

Compare memory usage #97

Open Keats opened 6 years ago

Keats commented 6 years ago

Is comparing the memory usage of benches something that might come in the future? In some cases I want to compare both the time and the memory usage.

bheisler commented 6 years ago

Hey, thanks for the suggestion.

I would like to add some sort of memory usage measurement in the future, yes. At this point it isn't an immediate priority, though.

I'm not sure at this point what the most useful way to measure memory usage would be. Do you have any specific use cases in mind?

Keats commented 6 years ago

Do you have any specific use cases in mind?

In my case, I want to compare the memory usage of some bits of current master and a PR to see the improvement/regression rather than just guessing it's better.

bheisler commented 6 years ago

Right. A better question may have been how do you measure memory usage?

Aside: You might want to consider whether the heapsize crate would work for you. It will probably be a while before I add this feature to Criterion.rs, if I ever do.

Here are some notes for myself if/when I get around to this:

On linux you can look at /proc/<pid>/vmstat. I think there's something similar for Windows, but I don't know the details. This has the advantage that you can look at the high-water mark, but it has a limited resolution.

Another option might be to collect some statistics from jemalloc. If you run a process with MALLOC_CONF=stats_print:true, then it will dump a bunch of memory statistics to stderr when it terminates. I'm not totally sure what any of those statistics mean, precisely - eg. if it shows the memory usage at the end of the program after everything's been dropped, it might not be so useful. Also, this requires that the benchmark process is compiled with jemalloc - the system allocator doesn't do anything like this.

In either case, we could potentially run a subprocess that would execute the benchmarked function and return some kind of memory information (either VmPeak or jemalloc's stats). Any side-effects outside of the benchmarked functions would be executed multiple times.

bheisler commented 6 years ago

Having thought about this some more, I realized that I'm probably overthinking it. Peak memory usage and bytes allocated per iteration are probably sufficient for almost all use cases. The latter would probably require accessing jemalloc directly though.

JeanMertz commented 6 years ago

I've also been looking for this. Specifically, coming from the Go world, I'd like to be able to measure/see how many (heap) mallocs take place in a piece of code I'm benchmarking.

If I write some code that is in a hot-path situation, I'd like to know both the time it takes to run this piece of code, but also the number of times it has to write out to the heap. Often those two go hand in hand (heap allocation == higher execution time), but it could also be that I have an O(n) problem that has nothing to do with memory allocation.

I don't (necessarily) always care about the amount of memory written to the heap, but at least I'd like to know that a roundtrip took place, instead of allocating it on the stack.

bheisler commented 6 years ago

Unfortunately, I don't think Criterion.rs can measure the amount of memory written to the heap. Measuring the memory allocated is probably possible, though.

I don't know much about Go, but at least in Rust the decision to stack- or heap-allocate a data structure is entirely in the programmer's control. Smart-pointers like Box and collections like Vec always store their contents on the heap, while regular data structures do not. Libraries (especially those used in a performance-sensitive context) should document their allocation behavior.

That said, it is also useful to be able to measure the amount of memory allocated on the heap. I probably won't get around to implementing this any time soon, but pull-requests would be welcome!

zbraniecki commented 4 years ago

Could a half-step be to allow for custom "benchmark" to be plugged into criterion, must like custom "measurements" are?

Something like:

c.custom_bench("memory_use", |b| {
  b.iter(|| {
      let struct = Struct::new();
      let value = calculate_memory_use(&struct);
      return value;
  });
});

and have this value be the output comparable to duration from timing measurements?

bheisler commented 4 years ago

Unfortunately, doing that for arbitrary measurements is difficult. The existing custom-measurements support expects the measured values to be time-like (ie. continuous or close to it, amenable to statistical analysis). Memory size is more typically constant. On the one hand, that means we don't need to do iterations or much analysis, but it does mean that more code (& the assumptions of existing code) needs to be changed.

I'm aware of this problem, but I wouldn't expect it to be solved soon, sorry.

luxalpa commented 9 months ago

I can measure my peak memory usage using the dhat-crate in a test. But it would be cool to have criterions other features for this and also to unify the entire experience. I think simplifying this overall experience would be useful for a lot of devs who just want to have an answer to the question of "did this change improve performance / memory usage" without needing to learn about all the details.

But maybe it is out of scope for this project and there could be a more overarching project that combines multiple of these performance/profiling things? You could have a crate for measuring CPU, one for measuring memory, one for profiling and another for all of the bells and whistles around it (basically what criterion does for the most part) and then you could have another crate that wraps all of these. Just an idea.

javierhonduco commented 4 months ago

This would be incredibly useful, especially measuring allocations as many others have mentioned (number of allocs per iter, allocated size per iter, and rate of allocations per unit of time). This works great in Go because they rolled their own allocator, but in Rust this is best done either with an allocator that provides this data, such as jemalloc(3), or with a wrapper around the system's allocator that can gather all this data and provide it to criterion.

Would criterion have a opt-in allocator wrapper be a reasonably way forward? I am not sure if technically this is 100% feasible, but I believe it could be. If this is possible, it should have to work with alternative system allocators, as well as with different operation systems.