Wilfred / difftastic

a structural diff that understands syntax 🟥🟩
https://difftastic.wilfred.me.uk/
MIT License
20.59k stars 333 forks source link

Fix memory leak in neighbours array. #698

Closed sesse closed 5 months ago

sesse commented 5 months ago

Vertex is allocated on the arena, so it is never dropped; then it cannot contain a Vec allocated on the regular heap without leaking memory. Replace the Vec with a slice allocated on the arena, which seems to fix most of the leaks. (Some may remain; I haven't checked fully.) It should also be slightly more memory-efficient.

It's not clear that we actually need the RefCell instead of just putting Option directly into the structure, but I've let it stay.

This issue was probably introduced in a71d6118cf52.

sesse commented 5 months ago

It seems there is also a similar issue in Stack (it contains Rc, which allocates on the heap, so would need a drop in order to free its memory). Possibly also in the Myers diff code, according to heaptrack.

Wilfred commented 5 months ago

Thanks for the PR! Could you confirm that performance (runtime and memory usage) aren't hurt? See https://difftastic.wilfred.me.uk/profiling.html for the commands I generally use to benchmark.

sesse commented 5 months ago

Sure. I've run it a couple of times before and after these two commits, and eyeballed the numbers to verify that they're stable. Note that the really big difference won't show up for a single diff; it will show up on repeated ones (e.g. diffing a directory), since having a leak and then immediately exiting is less important. Still, it seems just doing less stuff on the heap seems to yield fairly nice improvements. I've stripped the commands to the parts that I believe are relevant; let me know if you want the entire set.

Before:

$ /usr/bin/time -v ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
        User time (seconds): 1.12
        System time (seconds): 0.23
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.36
        Maximum resident set size (kbytes): 508736

$ perf stat ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
     2 952 905 517      instructions                     #    0,93  insn per cycle      

$ perf stat ./target/release/difft sample_files/typing_old.ml sample_files/typing_new.ml
         6 261 181      instructions                     #    0,79  insn per cycle

After:

$ /usr/bin/time -v ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
        User time (seconds): 0.94
        System time (seconds): 0.13
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.08
        Maximum resident set size (kbytes): 365592

$ perf stat ./target/release/difft sample_files/slow_before.rs sample_files/slow_after.rs
     2 608 461 567      instructions                     #    0,93  insn per cycle

$ perf stat ./target/release/difft sample_files/typing_old.ml sample_files/typing_new.ml
         6 206 141      instructions                     #    0,77  insn per cycle
Wilfred commented 5 months ago

A logic improvement, and a performance win! Thanks for the PR :)

It's not clear that we actually need the RefCell instead of just putting Option directly into the structure, but I've let it stay.

I don't think it would be possible to have the neighbors field in Vertex be a plain Option, because it would force set_neighbors to take a &mut Vertex, but we don't have a unique reference to the vertex. I could be missing something.