Open ricardoV94 opened 2 years ago
@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer
@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer
Some of the dict
-based replacement code from the non-global rewriters might come in handy; otherwise, there shouldn't be anything special about the local vs. global approaches.
Elemwise{Composite{(i0 exp((i1 i2)))}} [id A] 0 |TensorConstant{(1,) of 2.0} [id B] |TensorConstant{(1,) of 0.2} [id C] |x [id D]
We should probably look into these kinds of fusions that turn scalar constants into inputs. It's not clear to me that this is any better than—say—making i0
and i1
constants and removing those two inputs.
In other words, why don't we produce the following instead?
Elemwise{Composite{(2.0 * exp((0.2 * i0)))}} [id A] 0
|x [id B]
Also, can you clarify the kinds of graphs you think we should be producing in these multi-client cases?
In other words, why don't we produce the following instead?
I was thinking the same. I can only think of wanting to reuse the same c_code
across different Applys, but that's not as relevant for Composites
Clarifying what is the idea behind allowing Composites with multiple clients. The following example:
import aesara
import aesara.tensor as at
x = at.vector("x")
y = at.exp(x/5)
w = y + 1
z = y * 2
f1 = aesara.function([x], [z, w])
aesara.dprint(f1)
Right now produces this graph:
Elemwise{Mul}[(0, 1)] [id A] 2
|TensorConstant{(1,) of 2.0} [id B]
|Elemwise{Composite{exp((i0 * i1))}} [id C] 0
|TensorConstant{(1,) of 0.2} [id D]
|x [id E]
Elemwise{add,no_inplace} [id F] 1
|TensorConstant{(1,) of 1.0} [id G]
|Elemwise{Composite{exp((i0 * i1))}} [id C] 0
After #1242, it produces this graph:
Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.0 [id A] 0
|TensorConstant{(1,) of 0.2} [id B]
|x [id C]
|TensorConstant{(1,) of 2.0} [id D]
|TensorConstant{(1,) of 1.0} [id E]
Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.1 [id A] 0
The expected savings come from having to iterate only once over the data vector x
, vs having to iterate 3 times before, once over x
and twice over exp(i0 * i1)
.
Another example is the following graph:
x = at.dvector("x")
mu = at.dvector("mu")
logp = (- ((x - mu) **2) / 2)
grad = at.grad(logp.sum(), x)
f2 = aesara.function([mu, x], [logp, grad])
aesara.dprint(f2)
Before:
Elemwise{Composite{(i0 * sqr(i1))}}[(0, 1)] [id A] 2
|TensorConstant{(1,) of -0.5} [id B]
|Elemwise{sub,no_inplace} [id C] 0
|x [id D]
|mu [id E]
Elemwise{neg,no_inplace} [id F] 1
|Elemwise{sub,no_inplace} [id C] 0
After:
Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.1 [id A] 0
|x [id B]
|mu [id C]
|TensorConstant{(1,) of -0.5} [id D]
Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.0 [id A] 0
Again we replace 3 loops by 1.
Regarding CSE, I don't think anything is changed. The code generated by the scalar Composite seems to be clever enough and reuse repeated computations in intermediate variables. Here is the C code of the Composite scalar in the first function:
// function 1
{
npy_float64 V13_tmp1;
// i0 * i1
V13_tmp1 = V11_i * V9_i;
npy_float64 V13_tmp2;
// exp(i0 * i1)
V13_tmp2 = exp((npy_float64)V13_tmp1);
// First output
// i2 * exp(i0 * i1)
V1_i = V7_i * V13_tmp2;
// Second output
// i3 + exp(i0 * i1)
V3_i = V5_i + V13_tmp2;
}
// function 2
{
npy_float64 V11_tmp1;
// i0 - i1
V11_tmp1 = V9_i - V7_i;
// First output
// - (i0 - i1)
V1_i = -V11_tmp1;
npy_float64 V11_tmp2;
// sqr(i0 - i1)
V11_tmp2 = V11_tmp1 * V11_tmp1;
// Second output
// i2 * sqr(i0 - i1)
V3_i = V5_i * V11_tmp2;
}
You can see that it avoids recomputing the same sub-expressions in both cases. It does store more intermediate variables than needed, but hopefully the compiler will be able to get rid of those.
I think the idea of inplacing scalar constants is worth investigating, but deserves a separate issue. It was already relevant for the old Composite graphs. Edit: I see it was already separated into #1270
Do you have a comparison like https://github.com/aesara-devs/aesara/pull/1242#issuecomment-1282335537 for these new CSE-related changes to your approach?
Right now, Elemwise composite graphs will be split at nodes that have multiple clients. This can be avoided by allowing the fusion rewriter to replace multiple outputs. One example where this is very common is in value_and_grad functions.
Discussed with @aseyboldt and @aesara-devs/aesara
See rationale and proposed changes in https://github.com/aesara-devs/aesara/issues/1237#issuecomment-1283599010