For now, I simply ignore the possibility of inlining a binding at some use sites but not others. It's only safe to do that in Dex for zero-work bindings, so I would want to think about it more to adapt the Secrets algorithm to our situation.
Unlike Secrets, this inliner takes pains to try to inline non-zero-work bindings into loops --- that's how we get loop fusion! This makes the required occurrence analysis significantly more elaborate.
For now, this inliner does not actually measure code size, effectively assuming that anything more than a variable reference is "large"; and therefore hesitates to inline it into more than one place. This misses opportunities to inline into the arms of case statements, etc.
For now, this inliner only does one local optimization during inlining, namely beta reduction of table indexing expressions (which is necessary for the inliner to be work-preserving). Per Secrets, it's probably a good idea to do more, so that fewer passes of inlining + peephole optimization suffice for convergence.
For now, Dex only runs the inliner once, immediately after simplification and before any other passes. We should iterate on that, measuring compile-time and run-time performance.
This inliner also leaves at least one known Dex-specific opportunity on the table, one we might call "pointless-array elimination". To wit, consider an array whose body is an atom, xs = for i. <atom>. This array is not an atom itself, and will do work if materialized at runtime, allocating and filling the array. But, if it's inlined into positions where it is indexed, it becomes an atom, which we treat as zero work. Ergo, such an inlining is profitable in more situations than the current setup can identify, such as for j k. xs.k. (If the body of xs did work, inlining xs would duplicate that work. If xs wasn't indexed at all, then inlining would duplicate the work or array creation regardless of the body. And right now the inliner can't detect the situation where a binding does work if not beta-reduced but does no work if beta-reduced, partly because that situation does not arise in GHC core.)
The inliner follows the architecture of the GHC inliner, as described in Secrets of the Glasgow Haskell Compiler Inliner, with a few notable differences:
This inliner also leaves at least one known Dex-specific opportunity on the table, one we might call "pointless-array elimination". To wit, consider an array whose body is an atom,
xs = for i. <atom>
. This array is not an atom itself, and will do work if materialized at runtime, allocating and filling the array. But, if it's inlined into positions where it is indexed, it becomes an atom, which we treat as zero work. Ergo, such an inlining is profitable in more situations than the current setup can identify, such asfor j k. xs.k
. (If the body ofxs
did work, inliningxs
would duplicate that work. Ifxs
wasn't indexed at all, then inlining would duplicate the work or array creation regardless of the body. And right now the inliner can't detect the situation where a binding does work if not beta-reduced but does no work if beta-reduced, partly because that situation does not arise in GHC core.)