Memory management for program nodes

dsharlet / slinky

Optimize pipelines for locality

MIT License

8 stars 2 forks source link

Memory management for program nodes #108

Open dsharlet opened 8 months ago

dsharlet commented 8 months ago

I just did an experiment: run the simplify test with reference counting disabled, so we just leak everything and don't pay the cost of ref counting (atomic adds). This ran ~20% faster than main, so there's quite a bit of time spent in object management here.

I think we should do the following:

Make all nodes "flat" objects. Currently, some of them use std::vector<T>, but these are immutable, so we could just allocate one block of memory for the whole node, and just use span<const T> to point at memory in that block instead.
Try to allocate nodes out of an arena. This would need to be part of the API (e.g. variable::make(arena, ...), so we can copy the final pipeline into a new arena without wasted memory. The whole arena could be ref counted by the nodes in it.

abadams commented 8 months ago

"Make all nodes "flat" objects. Currently, some of them use std::vector, but these are immutable, so we could just allocate one block of memory for the whole node, and just use span to point at memory in that block instead."

Huh. If that works we should be doing that in Halide too.

dsharlet commented 8 months ago

The commit I just tagged has a proof of concept of this, it implements the idea for call nodes. This 100% works, and is clean even in asan (doesn't leak memory), except it needs ASAN_OPTIONS=new_delete_type_mismatch=0 because it mismatches new and delete. I'm sure this could be fixed too, but it probably needs all of the nodes to commit to this strategy so they can all use the same destructor logic.

It's not pure win: before, we could constructor vectors, and then just move them into the nodes. That doesn't work any more. There are probably other ways to design the surrounding code to avoid that problem...

steven-johnson commented 8 months ago

Huh. If that works we should be doing that in Halide too.

Yeah, the gotcha with that is the API has no way to pass in an Arena (etc). Presumably we could work around it by adding an implicit HalideContext or whatnot as a thread-local.