Closed EinarSnorrason closed 5 months ago
Nice work!
As for removing unnecessary stloc/ldloc pairs, while it is a bigger task, it can be split into smaller, easier to implement parts.
I would first start with removing "dead" stlocs. Basically, you would first need to check if a local variable is never read or its address is never taken, and then replace all stloc
s to this variable with pops - don't remove that tree entirely, since the non-root parts still can have side effects.
You should do that in Method::opt
, and use the cil iterators to make things easier.
After that, you can call Method::realloc_locals
to remove all unused local variables.
This optimization will not do much on its own, since such dead stlocs are very rare.
The next step will have a much greater impact.
ldloc
s with the original tree, where possibleThis optimization should be done on a per-block basis (at least for now), to reduce complexity.
Iterate through all CILTree
s of a basic block. If the current tree has a root of CILRoot::STLoc
, check if the nodes calculating the value of that local have no side effects (by calling CILNode::has_side_effects
). If it has no side effects, check if it does not dereference any pointers. If all of those conditions are true, then it is a candidate for a move.
When we "move" a node, we simply replace any future occurrences of ldloc.n
, so that this:
ldloc.0
ldloc.1
add
stloc.2
ldloc.2
ldloc.2
add
stloc.3
ldloc.2
ldloc.4
mul
stloc.3
becomes:
ldloc.0
ldloc.1
add
stloc.2
ldloc.0
ldloc.1
add
ldloc.0
ldloc.1
add
add
stloc.3
ldloc.0
ldloc.1
add
ldloc.4
mul
stloc.3
We can propagate this replacement until we find a CILTree that prevents it.
A tree prevents a move if it may change anything the nodes we are moving depend on.
So when we are moving those nodes:
ldloc.0
ldloc.1
add
We can move across this tree:
ldc.i4.0
stloc.2
But not this tree
ldc.i4.0
stloc.1 // Writes to 1, and the nodes we are moving depend on `1`
or this tree:
ldloc.3
ldloc.4
stind.i8 // Writes to memory, so could write to local `0` or `1`, so a move is impossible
or this tree:
ldloc.5
call int32 SomeMethod(int32) // We don't know what it does, so it could change local `0` or `1`, so we can't move
pop
For now, we can assume such a replacement can only move across:
n
, that is not referenced by the nodes we are moving,
an whose value calculation has no side effects(CILNode::has_side_effects
returns false).CILNode::has_side_effects
returns false).This list can be expanded in the future, but this alone should allow for a lot of optimizations.
As a final clean-up step, iterate through all CILTrees, and if it is a CILRoot::Pop, whose value calculation has no side effetcs, remove that tree.
When those optimizations will be applied to this CIL:
ldloc.0
ldloc.1
clt
stloc.2
ldloc.2
brtrue bb_4_0
First, the stloc.2
will be "moved":
ldloc.0
ldloc.1
clt
stloc.2
ldloc.0
ldloc.1
clt
brtrue bb_4_0
Then, the stloc.2
will become dead, and get replaced with a pop
:
ldloc.0
ldloc.1
clt
pop
ldloc.0
ldloc.1
clt
brtrue bb_4_0
Then, the pop will be removed(since it has no side effects).
ldloc.0
ldloc.1
clt
brtrue bb_4_0
Then, clt
and brtrue
will get simplifed:
ldloc.0
ldloc.1
blt bb_4_0
leaving us with an optimal sequence of CIL instructions.
Thanks for the reply @FractalFir! It's a really cool project. I'll take a look at the stloc optimization when I have time. I have a couple of questions for it if you have the time 😄
Sorry for taking a bit to respond, I was busy.
The phrase "dead stloc" was a bit misleading, sorry. For now, you can assume that a stloc is dead when and only when this local variable is never read/its address is not taken. So, you could do something like this:
// Find all locals, which are not read, and their addres is not taken.
let is_loc_dead = vec![true;method.locals.len()];
for node in method.iter_cil().nodes(){
match node{
// If we read or take the address of a local, it is not dead.
CILNode::LdLoc(loc) | CILNode::LdLocA(loc)=>is_loc_dead[loc as usize] = false;
_=>(),
}
}
// For each local *n* which is "dead", replace `STLoc(n,val)` with `Pop(val)`
for (n,is_dead) in is_loc_dead.iter().zip(){
if is_dead{
// Replace all writes to this local with Pops, since the value calcualtion might have side-effects
todo!()
}
}
// Call `realloc_locals` to remove those local variables
method.realloc_locals();
load_count * number_of_nodes < 10
. That should keep the node count reasonable. Since all calls are assumed to have side effects, there would danger of ever duplicating more than 10 nodes, which I feel is reasonable. Besides that, due to the way MIR works, most node trees are relatively flat.
Also, removing a stloc
/ldloc
pair can remove the "hidden" casts in the case of U8, U16, I8, I16. So, when removing a stloc
ldloc
pair that uses a local of that type, an explicit cast needs to be added.
Implemented several branching instructions, added some optimizations to the BTrue and BFalse roots, and added some unit tests for these optimizations.
Unfortunately, when testing this I discovered that while the optimizations do work, they don't actually appear in the generated IL. This is because the IL is not doing this:
Instead it's doing this:
So none of the optimizations I've written actually do anything at the moment 😅
Changing the optimization function to remove the unnecessary pushes and pops from the stack is a bit more invasive, so I might work on that in another PR.
Let me know if there's anything I should change or work on!
Resolves #44
Resolves #43