NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.52k stars 1.5k forks source link

Custom allocator for Exprs #8968

Open roberth opened 1 year ago

roberth commented 1 year ago

Is your feature request related to a problem? Please describe.

For docstring support we need to access the parent of an Expr. (workarounds not feasible - @hsjobeki and I have tried) We expect the memory overhead of an extra Expr field to be on the order of 5%, so this loss would be prevented. (measurement tbd) Furthermore memory allocation overhead can be bypassed (both cpu, sync and memory)

Describe the solution you'd like

By allocating Exprs together in an area of memory, we can instead determine the root virtually for free, and find the parent by doing an O(log n) binary search.

This also speeds up parsing, because we can use an allocator that performs no synchronization. We can also take advantage of the fact that individual Exprs aren't freed, so we don't need to track object sizes, reducing memory overhead.

I couldn't find an off the shelf allocator. Possible data model:

constexpr size_t region_size;
pool<Allocator> allocators;

struct Region {
  void * start;
  void * end;
}

struct Allocator {
  void * currentRegionStart;
  void * cursor;
}

map<Region, Expr *> roots;

When starting to parse, request an allocator and use it for all Expr allocations in the parser. These are single threaded, so Allocator access will be efficient. When the next cursor position doesn't fit in the allocator's region anymore, or when parsing is done, add the used region to the roots map.

To find the root of an Expr, use its pointer to find it in the roots map. To find a parent, also perform a binary search using pos info.

Describe alternatives you've considered

Go through pos and the parse cache. This would not support

This has its own complexity and does not speed up Expr allocation, and does not reduce allocator memory overhead to a bare minimum.

Additional context

Priorities

Add :+1: to issues you find important.

inclyc commented 1 year ago

For docstring support we need to access the parent of an Expr.

I've maintained some code for traversing nix::Expr nodes, and construct parent information in a map, using a recursive visitor.

There is also a file describing the child nodes for each nix::Expr.

This class RecursiveASTVisitor is a generated CRTP class that consumes a nix AST, and call visitExprXXX for each nix node, which allow us to do static analysis on nix ASTs.

By allocating Exprs together in an area of memory, we can instead determine the root virtually for free, and find the parent by doing an O(log n) binary search.

I think it is good to allocating nix::Exprs in an area of memory, but how can you determine this size of the region? It cannot be resized & moved since the pointer will change if you do something like std::vector.

find the parent by doing an O(log n) binary search.

You don't need to do binary search, just traverse the nix AST, and record an std::map between child & parent nodes, as mentioned above.

roberth commented 1 year ago

how can you determine this size of the region?

I'd thought to use a sequence of fairly big regions, but I didn't incorporate that in the example data model. By allowing a sequence of regions to be used for a single file, you don't have to know the size in advance.

It cannot be resized

A more clever solution could "shrink" a region when parsing is done. This complicates finding the root a bit, because you can't just round down the pointer to get an identifier for the region. Does C++ have a std::map like type for assigning values to ranges of keys instead of individual keys?

You don't need to do binary search [...] std::map between child & parent nodes

I was trying to trade time for space. We wouldn't need to find the parent or root often, so by avoiding an std::map, we'd save memory. However, as long as this map isn't allocated eagerly, it seems like a good alternative.

hsjobeki commented 1 year ago

@inclyc if you'd like to implement this i would be super thankful. Just spent 6 hours finding a bug in the regex parser for doc-comments. Having access to the AST would help me a lot. Looked into the details but couldn't yet implement it. (Also im a c++ noob). Reach me on matrix if you like teaming up: @johannes.kirschbauer:scs.ems.host