01mf02 / jaq

A jq clone focussed on correctness, speed, and simplicity
MIT License
2.61k stars 63 forks source link

Lazy path execution #150

Closed 01mf02 closed 6 months ago

01mf02 commented 6 months ago

jq evaluates filters that appear in path expressions lazily, whereas jaq previously evaluated them strictly. To see the difference, take the filter [1] | .[repeat(0)]: Running it with jaq 1.2 yields no single output, because it tries to evaluate repeat(0) fully before yielding a result, whereas running it with jq 1.6 yields an infinite sequence 1, 1, 1, ....

This behaviour is quite difficult to achieve, especially when trying to reduce clones to enable unique access. In particular, when we run [1] | .[repeat(0)], repeat(0) needs to hold on to its input, which is [1], because the interpreter is not smart enough to see that it does not use it. That means, however, that we cannot consume [1] in this operation, because repeat(0) clings to it. That means that we have to clone [1] every time. That's OK. However, now consider the case where [1] | .[0]. This is a very common case --- indexing with a single value (0). In that case, the interpreter can figure out that the filter 0 yields only a single value (0), so it could consume [1] in this case.

jaq now achieves this by doing the following analysis: For every filter appearing in a path expression, if the filter yields at most one value, then this value is calculated immediately, otherwise the calculation is delayed. That implies that if all filters in a path expression yield at most one value, then the whole path expression is calculated strictly, and the operation can thus consume the input (if nobody else clings on to it). This is realised by the new function collect_if_once.

This PR implements this behaviour for both path access and path update operations.