Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Add "short-index" version of often occurring ops #425

Open lizmat opened 6 months ago

lizmat commented 6 months ago

As a follow up on https://github.com/Raku/problem-solving/issues/424:

In the core c setting, the 10 most often occurring opcodes with a string index are:

$ bceval c '.map({.name if .operands.first("str")}).Bag.sort(-*.value).head(10)' 
dispatch_o => 108459
const_s => 61445
getattr_o => 16352
dispatch_v => 15188
bindattr_o => 11969
dispatch_i => 7467
getlex_no => 2769
getattr_i => 1657
bindattr_i => 863
param_on_o => 788

It looks to me that we could get a 2 * (108459 + 61445)) / 11983692 = 2.8% reduction in frame's opcode size (and thus improve its chances of getting inlined) by creating 2 alternate versions of opcodes: dispatch_o and const_s: dispatch_o_16 and const_s_16. The only difference with their counterparts would be reserving 2 bytes for the string index, rather than 4.

In interp.c, the additions would be (I think, please correct me if I'm wrong, as my C is rusty):

            OP(const_s_16):
                GET_REG(cur_op, 0).s = MVM_cu_string(tc, cu, GET_UI16(cur_op, 2).u32);
                cur_op += 4;
                goto NEXT;
            OP(dispatch_o_16): {
                MVMDispInlineCacheEntry **ice_ptr = MVM_disp_inline_cache_get(
                        cur_op, bytecode_start, tc->cur_frame);
                MVMDispInlineCacheEntry *ice = *ice_ptr;
                MVMString *id = MVM_cu_string(tc, cu, GET_UI16(cur_op, 2).u32);
                MVMCallsite *callsite = cu->body.callsites[GET_UI16(cur_op, 4)];
                MVMuint16 *args = (MVMuint16 *)(cur_op + 6);
                MVMuint32 bytecode_offset = (cur_op - bytecode_start) - 2;
                tc->cur_frame->return_value = &GET_REG(cur_op, 0);
                tc->cur_frame->return_type  = MVM_RETURN_OBJ;
                cur_op += 6 + 2 * callsite->flag_count;
                tc->cur_frame->return_address = cur_op;
                ice->run_dispatch(tc, ice_ptr, ice, id, callsite, args, tc->cur_frame->work,
                        tc->cur_frame->static_info, bytecode_offset);
                goto NEXT;
            }           

I guess the oplist parser would need to support a new kind of string index reference, something like "str16".

And of course the necessary changes in the MAST writing.

lizmat commented 6 months ago

If this works out, then maybe all 29 ops that have a string index, should get this treatment:

$ bceval c '.map({.name if .operands.first("str")}).unique.elems'                   
29

Which in the core setting would mean a saving of (2 * 228568) / 11983692 = 3.8% saving.

% bceval c '.grep(*.operands.first("str")).elems' 
228568