Closed ggevay closed 7 years ago
The easiest way that I can think of to do 3. on DSCF is to instead of maintaining just the innermost enclosing loop, we maintain the number of enclosing DefDefs that represent loops minus the number of enclosing suffix$
DefDefs. And then if this number is not the same for a ValRef as for its corresponding ValDef, then insert .cache
. (I hope this works, but I'm not 100 sure.)
I'll try to reformulate 3
with the help of some imaginary API for the sake of further discussion.
[...] we create an inherited attribute that tells the innermost enclosing loop, [...]
// returns the innermost enclosing loop (if `t` is inside a loop)
def enclLoop(t: u.Tree): Option[u.TermSymbol]
[...] and then an accumulated attribute builds a map that shows for each ValDef which is the innermost loop that contains it [...].
// for a symbol represending a Scala `u.ValDef` or `u.VarDef`,
// returns the innermost enclosing loop (if `s` is defined inside a loop)
def inLoop(s: u.TermSymbol): Option[u.TermSymbol]
And then when we see a ValRef, we check whether the current innermost enclosing loop is the same as the innermost enclosing loop of the definition (which we get from the map).
I think if we should match BindingRef
here.
case t@BindingRef(s)
if inLoop(s) != enclLoop(s) && ... =>
// ???
I'm not sure what to do in ???
other than store the information that s
needs to be cached.
val x = /* DataBag term for symbol `s` */
// ...
f(x)
should be replaced with
val x = /* some DataBag term */
val y = x.cache()
// ...
f(y)
Here is another suggestion. Suppose we work on DSCF. Keep track of the current owner. Then synthesize an attribute in the following way:
ValRef
with type of DataBag
(we can think about parameters later, it's not much different);ValRef
owner and is a loop method:
accessedInLoop
);refCount
);
Map[TermSym, (Boolean, Int)]
.DefCall
of a loop and set flag for all arguments (but not count - already done);Let
block to make sure you collected the counts;ValDefs
that are accessedInLoop
or have refCount > n
with cached vals (reuse symbol here).Pretty much done? The difference with parameters is following:
@joroKr21 The algorithm makes sense but a few things have to be revised in order to cover some corner cases.
R1. The check in (2) will not match the following
val xs = ???
for (i <- 1 to N) {
if (i % 2 == 0) f(xs)
else g(xs)
}
R2. I don't see how the sketched transformation will identy xs
to be cached in the original motivating example:
val xs = ???
if (???) f(xs)
else g(xs)
This will have refCount = 2
and will be cached. To catch such cases, conceptually we have to
TermSym
per block;refCount
sums picking one element of each independent block set; refCount > 1
exists.Yes, indeed. The problem in R1 can be solved with this predicate (assuming we track the complete owner chain): owners.takeWhile(_ != ref.owner).exists(isLoop)
R2 poses an interesting question. If we generalize the refCount
as a kind of score, how much is worth a ref within a branch? 0.5? Imagine that a ref can be nested in several layers of branching.
Closed via #305.
We would like to insert
.cache
calls in three cases:I'm thinking about whether to do this on DSCF or on control-flow primitives.
.cache
call when the DataBag is used only in the suffix.