Open roberth opened 1 year ago
For docstring support we need to access the parent of an Expr.
I've maintained some code for traversing nix::Expr
nodes, and construct parent information in a map, using a recursive visitor.
There is also a file describing the child nodes for each nix::Expr
.
This class RecursiveASTVisitor
is a generated CRTP class that consumes a nix AST, and call visitExprXXX
for each nix node, which allow us to do static analysis on nix ASTs.
By allocating Exprs together in an area of memory, we can instead determine the root virtually for free, and find the parent by doing an O(log n) binary search.
I think it is good to allocating nix::Expr
s in an area of memory, but how can you determine this size of the region? It cannot be resized & moved since the pointer will change if you do something like std::vector
.
find the parent by doing an O(log n) binary search.
You don't need to do binary search, just traverse the nix AST, and record an std::map
between child & parent nodes, as mentioned above.
how can you determine this size of the region?
I'd thought to use a sequence of fairly big regions, but I didn't incorporate that in the example data model. By allowing a sequence of regions to be used for a single file, you don't have to know the size in advance.
It cannot be resized
A more clever solution could "shrink" a region when parsing is done. This complicates finding the root a bit, because you can't just round down the pointer to get an identifier for the region. Does C++ have a std::map
like type for assigning values to ranges of keys instead of individual keys?
You don't need to do binary search [...]
std::map
between child & parent nodes
I was trying to trade time for space. We wouldn't need to find the parent or root often, so by avoiding an std::map
, we'd save memory. However, as long as this map isn't allocated eagerly, it seems like a good alternative.
@inclyc if you'd like to implement this i would be super thankful. Just spent 6 hours finding a bug in the regex parser for doc-comments. Having access to the AST would help me a lot. Looked into the details but couldn't yet implement it. (Also im a c++ noob). Reach me on matrix if you like teaming up: @johannes.kirschbauer:scs.ems.host
Is your feature request related to a problem? Please describe.
For docstring support we need to access the parent of an Expr. (workarounds not feasible - @hsjobeki and I have tried) We expect the memory overhead of an extra Expr field to be on the order of 5%, so this loss would be prevented. (measurement tbd) Furthermore memory allocation overhead can be bypassed (both cpu, sync and memory)
Describe the solution you'd like
By allocating Exprs together in an area of memory, we can instead determine the root virtually for free, and find the parent by doing an O(log n) binary search.
This also speeds up parsing, because we can use an allocator that performs no synchronization. We can also take advantage of the fact that individual Exprs aren't freed, so we don't need to track object sizes, reducing memory overhead.
I couldn't find an off the shelf allocator. Possible data model:
When starting to parse, request an allocator and use it for all Expr allocations in the parser. These are single threaded, so
Allocator
access will be efficient. When the next cursor position doesn't fit in the allocator's region anymore, or when parsing is done, add the used region to theroots
map.To find the root of an Expr, use its pointer to find it in the
roots
map. To find a parent, also perform a binary search usingpos
info.Describe alternatives you've considered
Go through
pos
and the parse cache. This would not supportThis has its own complexity and does not speed up Expr allocation, and does not reduce allocator memory overhead to a bare minimum.
Additional context
Priorities
Add :+1: to issues you find important.