Closed lizmat closed 6 months ago
General information about C setting:
$ bcinfo c
File: /Users/liz/Github/rakudo/blib/CORE.c.setting.moarvm (28946040 bytes)
Opcodes: 7934130 bytes
String Heap: 38100 different strings, 543292 bytes
Frames: 24949 frames, 11983692 bytes
276382 local variables
552403 lexical variables
6518 handlers
537714 static lexical values
31048 local debug names
Number of opcodes in the C setting that have a string index in them:
$ bceval c '.grep(*.operands.first("str")).elems'
228568
So this would mean say (2 * 228568) / 7934130
= 5.7% saving in frame size, obviously some frames more than others.
Each frame has 2 string indices: one for the cuid
, and one for any name associated with the frame. So that would mean 2 * 24949 = 49898
bytes.
Each lexical variable has a string index, so 2 * 552403 = 1104806
.
Each local debug name has a string index, so 2 * 31048 = 62096
.
Total savings in frame: 49898 + 1104806 + 62096 = 1216800
. So that means 1216800 / 11983692
= 10.2% saving.
10 most common opcodes:
% bceval c '.map(*.name).Bag.sort(-*.value).head(10)'
set => 121552
dispatch_o => 108459
decont => 97639
wval => 79694
const_s => 61445
getcode => 60327
getlex => 32942
assertparamcheck => 25811
wval_wide => 25769
checkarity => 24949
10 most occurring consecutive opcode pairs
$ bceval c '.rotor(2 => -1).map(*>>.name.join(",")).Bag.sort(-*.value).head(10)'
dispatch_o,set => 35882
set,decont => 35252
dispatch_o,dispatch_o => 34717
decont,const_s => 34636
const_s,dispatch_o => 22742
decont,set => 22351
getcode,push_o => 20096
push_o,getcode => 20095
param_rp_o,dispatch_o => 19685
set,wval => 19567
Yeah, I know this is showing off, but may be an inspiration for future opts.
Your test code is rather taxing on the compiler as that chained concatenation will lead to an extremely deep AST. Even the length of the line may be a problem by itself.
This generates a module that has > 106496 compile time known strings:
rakudo -e 'for "a".."z" -> $prefix {say "say <" ~ (0 .. 4095).map({"$prefix$_"}).list.Str ~ ">;\n"}' > lib/Large.rakumod
Precompiling this large module taks about a second or two, consumes some 2 GB of memory and the result works just fine.
This also shows how you would end up with such a large number of strings: it's most likely in generated code. 64K isn't all that many. Think of some Unicode related module that just contains all characters as invdividual strings somewhere. RakuAST makes no difference here btw. These strings would still be literals that get stored in the string heap. Early on there were attempts to use precomp files as storage format for installed module's metadata. This would still be a reasonable idea even today. So it may well be that someone will try that with different data.
Considering how often we have run into such arbitrary limits with the JVM I'd say let's not inflict this kind of pain onto others.
@niner That's a good point. my proposal would actually break my uniname-words module:
uniname-words $ bcinfo .precomp/DF77E765162CF1798065DEE15C0FD8CA729B512F/14/143DC7A07AAE61D6F77EB4832D9CDA89D8CC5CA4
File: /Users/liz/Github/RAKU/uniname-words/.precomp/DF77E765162CF1798065DEE15C0FD8CA729B512F/14/143DC7A07AAE61D6F77EB4832D9CDA89D8CC5CA4 (15577504 bytes)
Opcodes: 16010 bytes
String Heap: 268132 different strings, 4666300 bytes
Frames: 59 frames, 49332 bytes
554 local variables
2500 lexical variables
22 handlers
2453 static lexical values
27 local debug names
268132 different strings.
So I guess the question in the title can be answered with a resounding YES
Currently, MoarVM uses an 32-bit unsigned int as an index into the string heap.
The largest bytecode file that we know of, is the bytecode file for the core C setting, which clocks in at 38K+ strings, of which 21K+ are used for
cuid
s (but that's another matter).It appears unlikely to me that we will ever need to have more than 64K unique interned strings (determined at compile time!) in a compunit.
So what would happen if we would reduce the string heap indices to a 16-bit unsigned int?
Con
Max 64K of unique strings in a compilation unit
For reference: a simple program with
say "a" ~ "b" ~ "c" ~ "d" ... ~ 'crxq'
produces this output on my M1mini:Trying to precompile that in a module, doesn't work: killed the process after it had been running for 15 minutes and eating 8GB of RAM. Which I think shows another deficiency, but that'll be for another day.
So I'd say it will be unlikely we will see that in actual use today. In any case, this does not prevent you from having more than 64K unique strings in a program: the limitation is per compunit!
Pro
Overall bytecode file size reduced by 10+%
Calculated from the core "c" setting using the tools provided by MoarVM::Bytecode in responses.
Frame sizes reduced
Depending on number of string indices used in a frame. Remember that each
dispatch_x
andconst_s
op has one string index in it. The core C setting has 108447dispatch_o
and 61392const_s
opcodes in it. Given that the core C setting currently uses 11983692 bytes for opcodes, this in itself would mean a saving of(2 * (108447 + 61392))
=339678
=339678 / 11983692
= 2.8%.Smaller frame sizes mean more opportunities for inlining, and thus faster execution!
FWIW, there are other indices in the bytecode that currently use a 32-bit unsigned index, that could probably be reduced to 16-bit unsigned. But that's for another day.