The current path setup involves quite a lot of types, roughly one per control flow concept. If we add fuzzer actions that manipulate while, do-while, for, and other C constructs, the amount of path types is going to explode. Since all of them are basically either indexing into a particular item in an AST node or saying 'focus on this node', I wonder if they can be replaced with something simpler.
My immediate thought is that a path is a list of pairs of the form (int, int), where each first int is an index into the current AST node, and the second int is a length that, when nonzero, specifies that the path is operating on a range of items.
What maps onto which index would be node-specific, but might be something like this:
In a statement list/block, index n selects the nth statement (ignoring declarations), and ranges operate on sublists of the list;
In a statement, index 0 specifies that the path is reaching into the statement rather than operating directly on it;
In an if-statement, index 0 is the conditional, index 1 is the true block, and index 2 is the false block.
((1.0) (2.0) (1.2)) would select [foo = 10; *x = 6;] as a list
One possible disadvantage of this encoding might be that fuzzer traces and errors are harder to read out, as the trace now needs some manual mapping back to the AST. Maybe there's a nice halfway house.
The current path setup involves quite a lot of types, roughly one per control flow concept. If we add fuzzer actions that manipulate
while
,do
-while
,for
, and other C constructs, the amount of path types is going to explode. Since all of them are basically either indexing into a particular item in an AST node or saying 'focus on this node', I wonder if they can be replaced with something simpler.My immediate thought is that a path is a list of pairs of the form
(int, int)
, where each first int is an index into the current AST node, and the second int is a length that, when nonzero, specifies that the path is operating on a range of items.What maps onto which index would be node-specific, but might be something like this:
n
selects then
th statement (ignoring declarations), and ranges operate on sublists of the list;0
specifies that the path is reaching into the statement rather than operating directly on it;0
is the conditional, index1
is the true block, and index2
is the false block.Given the example program
((0.0))
would select*x = 50
((0.0) (0.0))
might select*x
((1.0) (0.0))
would select*y == 42
((1.0) (1.0) (0.0))
would selectfoo = 9
((1.0) (2.0) (1.2))
would select[foo = 10; *x = 6;]
as a listOne possible disadvantage of this encoding might be that fuzzer traces and errors are harder to read out, as the trace now needs some manual mapping back to the AST. Maybe there's a nice halfway house.