Open axch opened 1 year ago
One idea for making the bijection restriction clearer to the user would be to split the Ix a
interface into two parts: A size
method, and a Bijection
interface that has an instance for Bijection a (Fin (size a))
.
Then maybe the compiler could rely on user-defined bijections to do this kind of optimization more generally. However, I guess this would require bringing back the interpreter, and also allowing dependent-enough types to allow size a
to appear in all the types.
On second thought, my proposal mostly just reinvents the existing Ix
class with more complicated types.
Why
Notionally, an
Ix a
instance is supposed to define a bijection betweena
and the firstsize a
natural numbers. In particular, on those natural numbers,ordinal
andunsafe_from_ordinal
are supposed to be inverses. Do we want to start using that in the compiler by replacing that composition withid
when it is known to occur?To wit, by the time we get to the hardware,
for
loops are counting integers, and arrays are indexed by integers. It seems nice to try to avoid round-tripping through thea
value, if the main thing we need is its ordinal.This should speed up both the generated code and the compiler, the former because it avoids running useless compositions of
ordinal . unsafe_from_ordinal
, and the latter because it avoids inlining and repeatedly compiling them. It should also simplify the compiler by better controlling how far user-definedIx
instances can travel.What
In detail, Dex
for
loops are implemented thus: When the compiler seesfor idx:a. body
(wherebody
has typeb
), it eventually generates imperative code that basically looks like this (with some abuse of notation):Also, when the compiler sees array indexing (i.e.,
xs[idx]
), it generatesSeems like maybe we should just remember that
i
is in scope, and instead of constructingordinal idx
, just insert a reference toi
.Now, we do still need to generate
idx
, because thebody
may use it for something other than indexing as well (e.g., unpacking a tuple-typed index), but in the case of a pure map, dead code elimination should remove it.(For what it's worth, something similar happens with table literals, except that we know the body doesn't read the index at all.)
Why not
The main problem I can imagine with doing this is that it assumes that
ordinal
really is a left inverse ofunsafe_from_ordinal
in allIx
instances, which is not something we have machinery to check. The optimization is observable ifordinal
has side-effects (e.g., viaunsafe_io
).How
Probably the simplest way to do it would be to write a new pass that changes the types of all arrays to
Fin <something>
.for
binder<new>
(ii) create an originally-typed binder<recon>
by emitting a call to theunsafe_from_ordinal
method of the original type on<new>
, and (iii) add a mapping for<orig>
saying "if you really want the original type, use<recon>
, but if the underlying Nat is fine, use<new>
".for
would add a mapping for the array saying "coerce the array<new binder>
to<original array type>
" (because arrays are represented as flat buffers regardless of the type of their index).TabApp
) would insert a call toordinal
at the array's original type in general, but would also check whether the index has a "the underlying Nat is<new>
" binding and in that case just use it and skipordinal
.ordinal
, but I think we currently can't becauseSimplify.applyDictMethod
already inlines them.<recon>
binder.<recon>
ends up being unused (as in a pure map that doesn't read the index except to index arrays with it), DCE should just clean it up.<new>
is valid could also fall back to just using the<recon>
binder, or we could try to play some game where the form returns<new>
and creates another binding that reconstructs the<recon>
again.ordinal
elimination.<recon>
binder "eagerly", as described, or "lazily", by insertingunsafe_from_ordinal <new>
in the places where<recon>
would otherwise be needed. That tradeoff is entirely about how many timesunsafe_from_ordinal
gets inlined during this pass.We would presumably only want to run this pass after we were done using semantic index information for occurrence analysis and any future loop splitting or such; and it has to run before (or within) Imp, because the Imp representation no longer has interesting index sets.
Thoughts?