Open erkyrath opened 3 years ago
Also note that the Z-code compiler uses a @jz
opcode unconditionally for these cases. Could be a @jump
instead.
Also from that forum thread:
Better optimized for-loops would be nice, using Z-code
@inc_chk
or@dec_chk
where possible to save one instruction.
General note:
My usual rule is to not change code generation without a good reason. There's some value in being able to rebuild an old game file from old source exactly. There's a lot of value in being able to rebuild an old game file at all. (We really don't want changes that make game files larger -- that might bump a game over a Z-code size limit.)
Optimizations like this count as a good reason. (They make game files shorter!) However, we might want an optimization setting, whose values are "better code" or "compile like it's 2010", defaulting to "better code". On the flip side, this switch might make the compiler source pretty murky! We'll have to see what the implementation looks like.
A branch gets emitted if there's nothing inside:
if (something) {}
something
could have side effects, so it still needs to be emitted, but the branch doesn't.
Another example I just noticed:
print (string) foo, "^";
This generates @print_paddr "^"
for the newline, even though @new_line
is available. In Glulx, it uses @streamstr
instead of @streamchar
.
while (true)
loops (possibly all loops) with an unconditional return at their end still generate a final jump instruction.
I7 makes a whole lot of dead code. Here's another example:
[ R_1078 ;
if (debug_rules) DB_Rule(R_1078, 1078);
! [2: fill status bar with table of fancy status]
(PHR_1076_r126 (T6_fancy_status));
! [3: rule succeeds]
RulebookSucceeds(); rtrue;
RulebookSucceeds(); rtrue;
rfalse;
];
What I7 code generated that?
From Counterfeit Monkey:
Rule for constructing the status line:
fill status bar with Table of Fancy Status;
rule succeeds
I suspect it happens any time the rulebook has a default outcome of success and the rule manually succeeds as well.
Probably, yeah. Thanks.
This code:
if(_w == 'n//') jump match_n;
if(_w == 's//') jump match_s;
if(_w == 'w//') jump match_w;
if(_w == 'e//') jump match_e;
gets compiled to this:
1576: c1 8f 03 11 e4 45 JE L02,"n" [FALSE] 157f
157c: 8c 01 97 JUMP 1714
157f: c1 8f 03 12 f5 45 JE L02,"s" [FALSE] 1588
1585: 8c 01 94 JUMP 171a
1588: c1 8f 03 14 8b 45 JE L02,"w" [FALSE] 1591
158e: 8c 01 97 JUMP 1726
1591: c1 8f 03 10 a2 45 JE L02,"e" [FALSE] 159a
1597: 8c 01 88 JUMP 1720
When an if-clause only holds a jump, it would be smart to perform that jump immediatiely in the compare+jump instruction.
I took a stab at the execution_never_reaches_here
optimization, but it's trickier than it looks.
Say you have lines like
if (true) return;
if (expr) {...} else {...};
You'd expect the entire second line to be optimized out. But the else
is a label, and labels are assumed to be reachable. So code generation starts up again.
We're going to need a way to "this statement is unreachable to begin with, so all labels within it are phantoms." But what if there's an explicit label inside the else
clause? Then we really do want code generation to restart, and maybe some of those phantom labels are needed after all.
For a concrete example: there are two places in I6 lib 6/11 where this kind of optimization kicks in.
[ FullScoreSub i;
ScoreSub();
if (score == 0 || TASKS_PROVIDED == 1) rfalse;
!...
];
if (k >= ValueOrRun(player, capacity)) {
if (SACK_OBJECT ~= 0) {
if (parent(SACK_OBJECT) ~= player)
return L__M(##Take, 12);
!...
TASKS_PROVIDED and SACK_OBJECT may be known at compile time, so we want to skip the entire following block. But with the change I've got, only a few opcodes get skipped before we hit a label.
Okay, I have working implementation. This turned out to be a difficult problem!
Discussion: https://intfiction.org/t/i6-compiler-code-generation-improvements/54739 Branch: https://github.com/erkyrath/Inform6/tree/strip-dead-branch
I'd like more testing from more people before I open a PR.
This change does the following:
if (0)
, if (1)
)if
and switch
statements where every branch returns@jz constant
opcodes where the constant is nonzero@jz 0
to an unconditional @jump
@jump
to the very next instructionIt does not try to:
if (test) jump...
if (test()) {}
@inc_chk
, @dec_chk
You can see a diff of the inform -a
output when compiling Advent.inf here: https://gist.github.com/erkyrath/a2ab185e4f90096374cf5396332cf0fe
The majority of the changes are dropping unused labels; this makes no difference to the generated code.
In a few places (see lines 128, 223, 572, 623, 634, 812, 833, 874) we drop an unused label after a jump/return and a following instruction, because that instruction can never be reached.
On lines 520 and 532, we drop a @jz 1
branch, since that is impossible.
The two big chunks of code which get eliminated are associated with TASKS_PROVIDED and SACK_OBJECT, as noted above.
PR: https://github.com/DavidKinder/Inform6/pull/164 . This is largely the same as what I had last month, but I've improved the "statement never reached" warning logic.
@erkyrath Can this issue be closed, or is there still more to be done?
This issue turned into a grab-bag. Ideas still open:
print "^";
generates @print_paddr "^"
for the newline, even though @new_line
is available. In Glulx, it uses @streamstr
instead of @streamchar
.
if (cond) {};
A branch gets emitted when there's nothing inside. The condition expression could have side effects, so it still needs to be emitted, but the branch doesn't.
'if (cond) jump label;` generates two branches when it could have only one.
Up to you whether those should be split out into separate issues. The first one is easy. The other two are hard and may never get done.
I don't think there's much to be gained in splitting the issues out, happy to leave this open while there are still further optimizations possible.
Adding this ...
Looking at output generated with -a
there's a lot of lines like this (not always get_prop
or get_prop_addr
, but for other opcodes too):
3800 +03265 <*> get_prop thing short_50 (each_turn) -> TEMP1
3800 +03269 push TEMP1
and
3479 +02ec9 <*> get_prop_addr o1 short_1 -> TEMP1
3479 +02ecd store p1 TEMP1
Couldn't this be simplified to (a tleast for zcode):
3800 +03265 <*> get_prop thing short_50 (each_turn) -> sp
and
3479 +02ec9 <*> get_prop_addr o1 short_1 -> p1
respectivly, and save 4 bytes per occurance?
I think this would boil down to adding a stack-pointer case to all the places in expressc.c that look like
# assemble_code_writing_to(..., temp_var1);
if (!void_flag) write_result_z(Result, temp_var1);
(Pretty sure none of these cases go on to access temp_var1 after that. It's always used as a strict temporary.)
Z-code could optimize x=x+1
to x++
(an @inc
opcode).
Expressions like (x+0)
, (x-0)
, (x*1)
, (x/1)
could be simplified at constant-folding time.
(They wouldn't be common in hand-written code, but I7-generated code does all sorts of awkward stuff. Or you might write (x*DICT_CHAR_SIZE)
, where DICT_CHAR_SIZE
is a constant that could be 1, depending on compiler settings.)
print (char) ch
generates roughly this in Glulx non-strict mode:
if (c>=0 && c<256)
@streamchar c;
else
@streamunichar c;
I was trying to protect very old interpreters that lacked the @streamunichar
opcode, but I think it's safe to say we're past that now.
We can just generate @streamunichar ch;
, unless ch
is a compile-time constant known to be in the 0-255 range, in which case @streamchar ch;
is better.
I do not plan to change the strict-mode RT__ChPrintC() routine, which does extra error checking.
Probably not trivial, but...
Code like below (in the CheapScenery
object in PunyInform)
default:
#ifdef SceneryReply;
if(SceneryReply ofclass string)
print_ret (string) SceneryReply;
i = location.&cheap_scenery;
w1 = self.number;
if(SceneryReply(i-->w1, i-->(w1 + 1)))
rtrue;
#endif;
"No need to concern yourself with that.";
The SceneryReply
can both be a string or a routine, which makes the compiler to compile this unreachable code print_ret (string) SceneryReply;
in the case it is a routine. When TXD tries to decompile this it throw a message to stderr
, Warning: printing of nonexistent string
. The decompiled snippet looks like below:
Routine 43b4, 2 locals
43b5: 41 f9 01 43 JE Ge9,#01 [FALSE] 43ba
43b9: b1 RFALSE
43ba: e0 07 42 6c 10 bf 04 00 CALL_VS 109b0 (#10bf,#04) -> -(SP)
43c2: a0 00 c7 JZ (SP)+ [TRUE] 43ca
43c5: 8d 10 bf PRINT_PADDR 42fc
43c8: bb NEW_LINE
43c9: b0 RTRUE
43ca: e0 23 41 f1 10 00 4a 01 CALL_VS 107c4 (G00,#004a) -> L00
43d2: 51 fb 08 ff GET_PROP Geb,#08 -> Gef
43d6: 2d 02 ff STORE L01,Gef
43d9: 54 02 01 00 ADD L01,#01 -> -(SP)
43dd: 6f 01 00 00 LOADW L00,(SP)+ -> -(SP)
43e1: 6f 01 02 00 LOADW L00,L01 -> -(SP)
43e5: e0 2b 10 bf 00 00 00 CALL_VS 42fc ((SP)+,(SP)+) -> -(SP)
43ec: a0 00 41 JZ (SP)+ [FALSE] RTRUE
43ef: b3 ... PRINT_RET "No need to concern yourself
with that."
The compiler knows at compile time if SceneryReply
is a routine or a string and could exclude the unreachable part.
That boils down to "constant-fold the ofclass
operator", which seems like a reasonable idea.
Revisting this issue I have created a branch (https://github.com/heasm66/Inform6/tree/code_generating_optimizations) that changes the code generated for get_prop
and get_prop_addr
so that it writes the result directly to the stack_pointer
instead of using TEMP1
for intermediate storage.
Change patterns like:
3800 +03265 <*> get_prop thing short_50 (each_turn) -> TEMP1
3800 +03269 push TEMP1
-->
3800 +03265 <*> get_prop thing short_50 (each_turn) -> sp
and
3479 +02ec9 <*> get_prop_addr o1 short_1 -> TEMP1
3479 +02ecd store p1 TEMP1
-->
3479 +02ec9 <*> get_prop_addr o1 short_1 -> p1
This was pretty common patterns and quite easy to identify. It saves ~250 bytes on Advent and ~400 bytes on Curses.
Example code in expressc.c
case PROP_ADD_OP:
{ assembly_operand AO = ET[below].value;
check_warn_symbol_type(&ET[below].value, OBJECT_T, CLASS_T, "\".&\" expression");
check_warn_symbol_type(&ET[ET[below].right].value, PROPERTY_T, INDIVIDUAL_PROPERTY_T, "\".&\" expression");
if (runtime_error_checking_switch && (!veneer_mode))
AO = check_nonzero_at_runtime(AO, -1, PROP_ADD_RTE);
//assemblez_2_to(get_prop_addr_zc, AO,
// ET[ET[below].right].value, temp_var1);
//if (!void_flag) write_result_z(Result, temp_var1);
if ((!void_flag) && (Result.value == 0)) { // can be optimized and written directly to stack_pointer
assemblez_2_to(get_prop_addr_zc, AO,
ET[ET[below].right].value, Result);
}
else {
assemblez_2_to(get_prop_addr_zc, AO,
ET[ET[below].right].value, temp_var1);
if (!void_flag) write_result_z(Result, temp_var1);
}
}
break;
.
.
.
case PROPERTY_OP:
{
check_warn_symbol_type(&ET[below].value, OBJECT_T, CLASS_T, "\".\" expression");
check_warn_symbol_type(&ET[ET[below].right].value, PROPERTY_T, INDIVIDUAL_PROPERTY_T, "\".\" expression");
//if (runtime_error_checking_switch && (!veneer_mode))
// assemblez_3_to(call_vs_zc, veneer_routine(RT__ChPR_VR),
// ET[below].value, ET[ET[below].right].value, temp_var1);
//else
//assemblez_2_to(get_prop_zc, ET[below].value,
// ET[ET[below].right].value, temp_var1);
//if (!void_flag) write_result_z(Result, temp_var1);
if (runtime_error_checking_switch && (!veneer_mode)) {
assemblez_3_to(call_vs_zc, veneer_routine(RT__ChPR_VR),
ET[below].value, ET[ET[below].right].value, temp_var1);
if (!void_flag) write_result_z(Result, temp_var1);
}
else {
if ((!void_flag) && (Result.value == 0)) { // can be optimized and written directly to stack_pointer
assemblez_2_to(get_prop_zc, ET[below].value,
ET[ET[below].right].value, Result);
}
else {
assemblez_2_to(get_prop_zc, ET[below].value,
ET[ET[below].right].value, temp_var1);
if (!void_flag) write_result_z(Result, temp_var1);
}
}
}
break;
Thanks for looking at this.
As usual, any change that affects code generation (in a non-opt-in way) is a very large testing burden. (We have to test game file behavior rather than just checking that the game file is identical.) So I'd rather accumulate a bunch of these optimizations and put them in a future I6 release so we can test them all at once.
Before we get there, I'd also like to add some game-file-behavior tests in the inform6-testing suite! It currently only has "game file is identical" tests. So that needs some attention. It may be a few weeks before I have a chance to look at it.
I understand what you are saying...
How about introducing an option ($OPTIMIZE_CODE or something) that would allow optimizations to be introduced on an "use at your own risk" basis and improvements can be introduced in smaller chunks and over a longer period?
Doing it this way would lessen the burden to testing that no old gamecode will get broken. Newly constructed games that is in development will probable be more extensively tested and developer would be aware that it is a more of an "experimental" option.
We still need some testing, of course. I ran through Curses with a walkthrough (without issues) but I don't know how to automate this process.
We could do that, but it doesn't decrease the testing burden because then we have to test with the option both on and off. :)
We could also maintain an "optimization" branch in the repo for people who want to live on the bleeding edge.
In the same branch as above I've added suggestion-code for these simplifications:
x=x+1 ==> x++
x=1+x ==> x++
x=x+0 ==> skip
x=0+x ==> skip
x=x-1 ==> x--
x=x-0 ==> skip
x=x*1 ==> skip
x=1*x ==> skip
x=x/1 ==> skip
I've also tested a playthrough of Curses without issues.
Current code optimization in branch:
get_prop, get_prop_addr directly to sp without using intermediate temp-var
Curses -408 bytes
Advent -244 bytes
Simplify x=x+1 to x++ and other
Curses -52 bytes
Advent -40 bytes
Maintaing an "optimization" branch that the "in" people can run through extensive testing could work.
Adding this, may not be trivial to change.
switch(expr) {...}
. If expr
:
TEMP1
TEMP1
before statements are executed.Example 1:
[ ClearScreen window;
switch (window) {
WIN_ALL: @erase_window -1;
WIN_STATUS: @erase_window 1;
WIN_MAIN: @erase_window 0;
}
];
Compiles to:
5495 +049bc [ ClearScreen window
5496 +049bd <*> store TEMP1 window
5497 +049c0 je TEMP1 short_0 (WIN_ALL) to L1 if FALSE
5497 +049c5 <*> erase_window long_65535
5498 +049c9 jump L0
5498 +049cc .L1
5498 +049cc je TEMP1 short_1 (WIN_STATUS) to L2 if FALSE
5498 +049d1 <*> erase_window short_1
5499 +049d4 jump L0
5499 +049d7 .L2
5499 +049d7 je TEMP1 short_2 (WIN_MAIN) to L3 if FALSE
5499 +049dc <*> erase_window short_0
5500 +049df .L3
5500 +049df .L0
5501 +049df <*> rtrue
Example 2:
[ PrintOrRunVar var flag;
switch (metaclass(var)) {
Object:
print (name) var;
String:
print (string) var;
if (flag == 0) new_line;
Routine:
return var();
default:
print (char) '(', var, (char) ')';
}
rtrue;
];
Compiles to:
5032 +045f4 [ PrintOrRunVar var flag
5033 +045f5 <*> call_vs long_25 (veneer routine: Meta__class) var -> sp
5033 +045fb pull TEMP1
5034 +045fe je TEMP1 short_2 (Object) to L1 if FALSE
5035 +04603 <*> call_2n long_6 (veneer routine: PrintShortName) var
5036 +04608 jump L0
5036 +0460b .L1
5036 +0460b je TEMP1 short_4 (String) to L2 if FALSE
5037 +04610 <*> print_paddr var
5038 +04612 <*> jz flag to L3 if FALSE
5038 +04616 <*> new_line
5038 +04617 .L3
5039 +04617 jump L0
5039 +0461a .L2
5039 +0461a je TEMP1 short_3 (Routine) to L4 if FALSE
5040 +0461f <*> call_1s var -> sp
5040 +04622 ret_popped
5041 +04623 .L4
5042 +04623 <*> print_char short_40
5042 +04626 print_num var
5042 +04629 print_char short_41
5043 +0462c .L0
5044 +0462c <*> rtrue
It seems that either the expr
could be stored directly in TEMP1
or the checks for jumps could be done against the original variable.
The generating code is in states.c
:
case SWITCH_CODE:
match_open_bracket();
AO = code_generate(parse_expression(QUANTITY_CONTEXT),
QUANTITY_CONTEXT, -1);
match_close_bracket();
INITAOTV(&AO2, VARIABLE_OT, 255);
assemblez_store(AO2, AO);
parse_code_block(ln = next_label++, continue_label, 1);
assemble_forward_label_no(ln);
return;
Either code_generate
could take a parameter where to store the result or parse_code_block
could take a parameter on what to switch.
The special switch-statements that switches on verbs don't use the TEMP1
variable:
[ LanguageLM n x1;
Answer,Ask:
"There is no reply.";
Attack: "Violence isn't the answer to this one.";
Blow: "You can't usefully blow ", (thatorthose) x1, ".";
Burn: "This dangerous act would achieve little.";
Buy: "Nothing is on sale.";
...
Compiles to:
414 +0044c [ LanguageLM n x1
415 +0044d je sw__var short_1 (action) short_2 (action) to L0 if FALSE
416 +00454 <*> print_ret "There is no reply."
418 +0045f .L0
418 +0045f je sw__var short_3 (action) to L1 if FALSE
418 +00464 <*> print_ret "Violence isn't the answer to this one"
419 +0047b .L1
419 +0047b je sw__var short_4 (action) to L2 if FALSE
419 +00480 <*> print "You can't usefully blow "
419 +0048f call_2n long_221 (routine: ThatorThose) x1
419 +00494 print_ret "."
420 +00497 .L2
420 +00497 je sw__var short_5 (action) to L3 if FALSE
420 +0049c <*> print_ret "This dangerous act would achieve little."
421 +004b5 .L3
421 +004b5 je sw__var short_6 (action) to L4 if FALSE
421 +004ba <*> print_ret "Nothing is on sale."
422 +004c5 .L4
...
I've made progress on your suggestions.
The inform6-testing repo now has support for running regtest scripts on the compiled game files. I'm using a complete Advent.inf run as the big test, but I also check Library of Horror, and all the unit tests that print "All passed", and a few other cases. See directory: https://github.com/erkyrath/Inform6-Testing/tree/master/reg
I have a branch with your optimizations (edited for code style): https://github.com/erkyrath/Inform6/tree/optimization
This passes all the regtest scripts. I don't want to push it in just yet though. There are more cases which could benefit from similar code. For example, @get_prop
storing directly to a variable.
I might also want to port some of these fixes to the Glulx side.
Feel free to give the optimization
branch a ride around the paddock in your Z-code work and see how it works out.
Sorry for the delay... (Feels like I mostly struggled with cherry-picking your commits into my own fork, guess I'm not fluent yet in git/github parlor.)
I have tested the optimization
branch with a couple of files and found no issues.
These patterns are fairly common (I guess these are the ones you mean):
1 +2169c get_prop o1 short_3 -> TEMP1
1 +216a0 store m TEMP1
...
3516 +033f1 <*> get_prop_addr o1 short_1 -> TEMP1
3516 +033f5 store p1 TEMP1
There're also a couple of places where a result is stored on the stack then immediately pulled and tested on. These are maybe harder to identify when they probably a spread out over multiple calls to generate_code_from
.
3780 +036f4 <*> get_parent actor -> sp
3780 +036f7 pull TEMP1
...
6140 +04aa9 <*> call_vs long_25 (veneer routine: Meta__class) a -> sp
6140 +04aaf pull TEMP1
These patterns are fairly common (I guess these are the ones you mean)
Yeah, those. This change handles those cases neatly.
I couldn't find any other cases subject to this sort of pinhole optimization.
Whoops, found a bunch more! y = x+0;
and y = x*1;
and similar expressions can be compiled down into @store
opcodes.
And yes, such lines turn up in I7-generated code all the time. Even the I6 parser has the line
artval=(o.&articles)-->(acode+short_name_case*LanguageCases);
...and LanguageCases
is 1 in English.
Took the latest changes out for a testrun. It saved an additonal 132 bytes
on Curses (down to 241.720 bytes
for those that's counting...) and run through the walkthroughs without issues.
Nice!
A routine like this:
[ LanguageDirection d;
switch (d) {
n_to: print "north";
s_to: print "south";
e_to: print "east";
w_to: print "west";
ne_to: print "northeast";
nw_to: print "northwest";
se_to: print "southeast";
sw_to: print "southwest";
u_to: print "up";
d_to: print "down";
in_to: print "in";
out_to: print "out";
default: return RunTimeError(9,d);
}
];
Compiles to:
201 +00068 [ LanguageDirection d
202 +00069 <*> store TEMP1 d
203 +0006c je TEMP1 short_13 (n_to) to L1 if FALSE
203 +00071 <*> print ""
204 +00074 jump L0
204 +00077 .L1
204 +00077 je TEMP1 short_14 (s_to) to L2 if FALSE
204 +0007c <*> print ""
205 +0007f jump L0
205 +00082 .L2
205 +00082 je TEMP1 short_15 (e_to) to L3 if FALSE
205 +00087 <*> print "e"
206 +0008a jump L0
206 +0008d .L3
206 +0008d je TEMP1 short_16 (w_to) to L4 if FALSE
206 +00092 <*> print "w"
207 +00095 jump L0
207 +00098 .L4
207 +00098 je TEMP1 short_17 (ne_to) to L5 if FALSE
207 +0009d <*> print "e"
208 +000a2 jump L0
208 +000a5 .L5
208 +000a5 je TEMP1 short_18 (nw_to) to L6 if FALSE
208 +000aa <*> print "w"
209 +000af jump L0
209 +000b2 .L6
209 +000b2 je TEMP1 short_19 (se_to) to L7 if FALSE
209 +000b7 <*> print "e"
210 +000bc jump L0
210 +000bf .L7
210 +000bf je TEMP1 short_20 (sw_to) to L8 if FALSE
210 +000c4 <*> print "w"
211 +000c9 jump L0
211 +000cc .L8
211 +000cc je TEMP1 short_21 (u_to) to L9 if FALSE
211 +000d1 <*> print "up"
212 +000d4 jump L0
212 +000d7 .L9
212 +000d7 je TEMP1 short_22 (d_to) to L10 if FALSE
212 +000dc <*> print "down"
213 +000e1 jump L0
213 +000e4 .L10
213 +000e4 je TEMP1 short_23 (in_to) to L11 if FALSE
213 +000e9 <*> print "in"
214 +000ec jump L0
214 +000ef .L11
214 +000ef je TEMP1 short_24 (out_to) to L12 if FALSE
214 +000f4 <*> print "out"
215 +000f7 jump L0
215 +000fa .L12
215 +000fa <*> call_vs long_714 (ref to symbol value: RunTimeError) short_9 d -> sp
215 +00101 ret_popped
216 +00102 .L0
217 +00102 <*> rtrue
The instruction label L0
is only a @rtrue
so it would be more effective to return immediately without the @jump
first. The logic is the same if the instruction at the label is @rfalse
or @ret_popped
too.
I have a first code suggestion that does this in the branch https://github.com/heasm66/Inform6/tree/optimization_replacing_jump
(currently one commit ahead of https://github.com/erkyrath/Inform6/tree/optimization
) that maybe need a bit of refactoring. It saved about 300 bytes on my test project.
Nifty. I'll take a look.
Imported and tidied up: https://github.com/erkyrath/Inform6/compare/8916ebb..ae74890
I note that this doesn't eliminate the destination @rtrue
, even if every jump to it gets replaced. Figuring out if we should do that is a bunch of extra work, which I think is not worthwhile.
Well if a more thorough DCE ever gets implemented (as requested above https://github.com/DavidKinder/Inform6/issues/87#issuecomment-862348221) it would presumably handle that too.
Yeah, I thought about the possibility that the destination label most likely gets orphaned but couldn't find any easy solution. As I understand it the label currently have a flag if it's used or not (true/false). Maybe this could be a counter instead that keeps track on how many times it's used. If you remove a reference to a label you decrease the counter and when it reaches 0 it's unused.
I'm gonna make a stab at a jump immediately following a branch (above) because the fix probably also will be in the transfer_code
function. I thought this was a pretty uncommon code pattern but my test project contained 66 occurrences, mostly in veneer and library code.
A suggestion for optimization when a jump immediately follows a branch is in this commit: https://github.com/heasm66/Inform6/commit/893c5314f298a2e3f1d57fa1ace7e9a3925a9204
Some current benchmarks (including this change):
Advent (Standard Library):
Inform 6.42 GV2: 118.180 bytes
Inform 6.43 GV2: 117.400 bytes (-780 bytes in z-code optimization)
Curses (Standard library):
Inform 6.42 GV2: 242.964 bytes
Inform 6.43 GV2: 241.924 bytes (-1.040 bytes in z-code optimization)
Dorm Game (PunyInform Library):
Inform 6.42 GV2: 117.078 bytes
Inform 6.43 GV2: 116.712 bytes (-366 bytes in z-code optimization)
Hibernated 1 PunyInform Library):
Inform 6.42 GV2: 87.940 bytes
Inform 6.43 GV2: 87.684 bytes (-256 bytes in z-code optimization)
Well if a more thorough DCE ever gets implemented
There is solid dead-code elimination now. But it happens during parse_routine(), whereas this change is in assemble_routine_end(), which is a later phase!
Maybe this whole question needs to be addressed earlier, at parse_routine() time. But that's tricky because when you're originally generating the (forward) jump, you don't know the destination opcode yet.
I've sifted through the thread and see these remaing issues:
@inc_chk
and @dec_chk
in for-loops (mentioned here and here).@print_paddr "^"
with @newline
(mentioned here and here).if (test()) {}
(mentioned here and here).ofclass
operator (mentioned here and here).TEMP1
with switch(expr) {...}
(mentioned here).Any other?
We already catch print "^";
although the mechanism could be more general. (See TODO comment on compile_string().)
(As suggested in this thread: https://intfiction.org/t/next-steps-for-inform-6-compiler/51008)
For lines like
...the compiler generates an unconditional jump, but then generates the return opcode anyway.
Ideally, the author would use
#ifdef
instead ofif
for this sort of code. However, I7 generates trivial conditions sometimes. E.g....generates the stanza:
The compiler successfully collapses
(~~(((true) && (true))))
to(false)
but still generates thev = GPR_FAIL;
assignment.This can almost be addressed with the
execution_never_reaches_here
flag, except that that flag is currently used to generate the "statement can never be reached" warning, which is not exactly the same thing. (Seeassembleg_1_branch()
, which turns off that flag to avoid generating a warning on "if (false)..." because the author probably did it on purpose.) (I7 didn't do it on purpose, but I7 suppresses all warnings anyhow.)I suspect the fix here is for
execution_never_reaches_here
to have three states: false, true, and "true but the author did it on purpose". If non-false, we can skip generating code entirely in assembleg_instruction() / assemblez_instruction().(Is it safe to just bail out of those functions? What about inline strings in Z-code? Sequence points? Symbols used only in dead code? So many corner cases to check!)
Note that the compiler assumes that every jump label is live code, even if no jump statement goes there. (See
assemble_label_no()
.) We should probably keep this. It allows the author to avoid dead-branch optimization on a case-by-case basis:(This might be needed for occult code generation techniques like inter-function jumps. Which we don't support, but maybe someone is trying to do it anyway.)