Open chippydip opened 8 years ago
PR submitted for the simple changes: https://github.com/KSP-KOS/KOS/pull/1733
Experimental opcodes for the curious: https://github.com/chippydip/KOS/commit/7310edfe1dc2b3145968aaad994ea655cd712898
The debug section takes a ton of space. It's like bundling a .pdb and functionally isn't required (this is no different than minimizing a text file to be all on one line).
I consider it required. I don't want to have to field the complaints from users who find that runtime errors no longer tell them where in the source script the error happened. This is also why when I run a minimizer, I don't eliminate the line endings. I still want to preserve line count for the sake of error messages, and that information is totally worth an extra \n
per line.
If you remove the debug info, I'd insist that it is only removed by use of a command option, rather than being removed unilaterally for everybody always. I could possibly still generate a stack trace with the error, like C# does, but it wouldn't narrow you down to the exact line - just the function.
I'll look at the rest of this in detail after the weekend. My time is sort of committed until then. It looks interesting. And yes, using immediate jumps for the relative branches does shave a lot of arguments out, and the fact that the arg marker didn't compact down to 1 is a big find. I think the last time I looked at it in detail, the arg marker was just a string, not a separate type of its own. That change came later and must have caused the duplication of the arg marker.
Another minimizing option is to create a few "hardcoded" arg pointers for args we know always tend to exist in every program no matter what - like we know every program is going to have a zero, a one, and an argbottom mark operand in it - so why store them at all? Why not just have magic codes like "when you see an address of -1, that means the arg is 0, when you see an address of -2, that means the arg is really the argbottom mark...." etc.
Take your time, like I said I just started playing with this last week and figured it might be interesting to other folks. (And, man, did this post get a lot longer than I anticipated)
Debug info is definitely worth having access to. Toward the end I suggest storing it as a separate .kdb file or having a config option to turn it off. Another idea would be to continue to include it but not count the debug info against the disk space limit.
Duplicate arg markers was clearly just an oversight. PseudoNull
has the Equals/GetHashCode overrides needed to prevent that so I just applied the same fix to the arg marker type. It doesn't save a ton of space relative to some of the other things I looked at, but it's the simplest change of the bunch.
I thought about adding hard coded arguments, but one of the biggest gains in file size comes when you can keep numArgIndexBytes
at 1 instead of 2, which means keeping the argument table smaller is better (combined with switching to ordinal instead of byte offsets).
Instead, I went with variations of some existing opcodes that use immediate mode arguments (encoded directly in the byte stream) rather than referencing the arguments table at all. Small values will fit in a single immediate byte anyway, and larger values are unlikely to be used more than once. This does require new opcode variants and a new MLField parameter to flag how the value is to be encoded, but currently only 54 out of a possible 256 opcodes are in use so adding a few more didn't seem like a big deal.
Once I got to the end of the work outlined above and had dropped the arg table size down from 147 to 54 I realized that nearly everything that remained was now a string (and mostly only things that appeared in the program text). So now I'm playing with the idea of using immediate-mode arguments for all non-string values. I'm also going a bit crazy and thinking about what things would look like with a CISC rather than RISC instruction set. Consider the following sample function:
PUSHSCOPE 1 0
PUSH "f"
SWAP
STORELOCAL
ARGBOTTOM
PUSH magic
PUSH
PUSH "$f"
CALL "" "logfile()"
POP
PUSH magic
PUSH "$f"
PUSH null
CALL "" "delete_deprecated()"
POP
POPSCOPE 1
PUSH 0
RETURN 0
18 ops, 34 bytes
That function is really only calling two other functions, the rest of the code is just shuffling things around on the stack. Assuming per-instruction overhead dominates execution time (think book-keeping code vs cpu.PushStack(Argument)
to execute each of the PUSHs), then an optimal instruction set would probably try to minimize the number of opcodes with a minimal amount of extra complexity. Such an instruction set might look something like this:
PUSHSCOPE 1 0
STORELOCAL "f"
ARGBOTTOM
CALLVOID "logfile()" "" "$f"
CALLVOID "delete_deprecated()" "$f" null
RETURNVOID 1
6 ops, 18 bytes
That's a big reduction in both opcodes and .ksm file size. The main ideas here are to have instructions that cover the common cases more efficiently (CALLVOID pops the return value for you at the end and RETURNVOID pushes a 0 before returning) and to use the stack only when needed. A huge number of opcodes are saved by including arguments as part of the call.
For the actual encoding I'm thinking about include a 0 terminated list of nibbles (half bytes, padded to a byte boundary as needed) to specify the encoding for each argument, followed by the encoded data. It's not shown in this example, but arguments could still be popped (or peeked) from the stack if needed, it just wouldn't be required.
Most of the additional work here would be in (de)serialization and even then it wouldn't be that much more work. OpcodeCall.Execute
would likely just push all the arguments before running it's existing code, but it would also be possible to pass them around as a separate list if that made sense.
An optimizing compiler could do even better:
CALLVOID logfile() "" [0]
CALLVOID delete_deprecated() @ null
ARGBOTTOM
RETURNVOID 0
4 ops, 11 bytes
This is realizing that nothing references this scope as it's parent so the scope can be elided and the local variable (actually parameter) can just be managed directly on the stack. This also demonstrates the ideas of indexed peeking (the "[0]" arg reference) and popping ("@"). This is just under 1/3 of the original code and 2/9ths the number of opcodes.
So, yeah...I'm mostly playing with this for fun and personal use, but if any of this looks like stuff that's you'd like to incorporate in the official release at some future date let me know!
(I also put together a spreadsheet documenting the current instruction set: https://docs.google.com/spreadsheets/d/12Xo3-4Ur-jkc3pqgRTD5jUFXmL3k0mjS97Mp53eNU7g )
I consider [the debugging section] required. I don't want to have to field the complaints from users who find that runtime errors no longer tell them where in the source script the error happened.
Other than some issues like "What happens if you compile a script on a probe with no RT connection?" (unlikely, but possible if the compilation itself is scripted), is there any reason the debugging section can't be a separate file on the archive somewhere that kOS references to get line numbers? (Or potentially even original source code)
There is no guarantee that either file on the archive would be in sync with the ksm file you are running. So the attempt to read the source code could be gibberish, pointing to the wrong line or even implying that kOS is executing the code out of order.
To separate out the debug info you'd need to include a source hash code as part of the .ksm file and then write the debug info to something like 0:/debug/<hash_code>.kdb
.
I know that this issue has been out there for a while, and that you have a pending pull request implementing a few of these changes. I just wanted to start up the conversation again because I came across an easy improvement while working on #1857. Since kOS ships with a C# implementation of gzip, we can have the ksm file store itself using that compression. In my initial testing, a 14,972B ksm file (11,967B ks source) was reduced to 8,426B by enabling the compression. I think that we will still want to implement some of your changes, specifically the ones in your PR look particularly easy to take advantage of. I'm going to edit the documentation for CompiledObject
while I'm making the modification to support gzip. @chippydip if you're still around and interested in working on this optimization, I'd be interested in looking at trying to work on your PR for one of the next couple of releases.
Compression sounds like a great idea. I actually considered trying something like that in KOS code after the newer file APIs were added but decided to look into opcode optimization instead. Anything that helps reduce compiled size to help incentivize storing binary instead of source versions is a good change in my book.
I'd be happy to answer any questions and/or make fixes to that PR, but I'm pretty busy with work right now so probably won't have time for too much beyond that.
Sorry for the long post, I wanted to brain dump this stuff in case anyone else is interested.
1712 got me thinking about .ksm files in general and why they are so gosh-darn big. In my experience they are always larger than a minified text version, and often bigger than the text version with whitespace and comments. This has always seemed ridiculous to me so I decided to start investigating a couple days ago and I now have numbers and suggestions!
All my testing so far as been done with boot.ks which I've heavily hand-optimized in the past. This may not be totally representative of scripts in general, but being able to compete with a hand-optimized and minified version seems like as good a place as any to start:
I started by writing a disassembler for ksm files so I could see what was going on under the hood. The output below is from the summary section of this tool (I also dump the full symbol table and instruction listings).
Starting Point
First Pass
The debug section takes a ton of space. It's like bundling a .pdb and functionally isn't required (this is no different than minimizing a text file to be all on one line). I also noticed that there were 33 separate copies of the "magic" stack marking value in the arguments table. Apparently
KOSArgMarkerType
doesn't have an Equals/GetHashCode override so a new value is added for pretty much every function call in the code.Optimizer
Looking at the contents of the arguments section, ~34 of the values were compiler-generated labels (mostly due to LABELRESET instructions where function entry points had been labeled with something other than the normal @NNNN labels). I went through and changed all the labels back to default in-order values which got rid of all but the first LABELRESET instruction. I also updated all jump instructions to use relative offsets rather than labels.
This was about the best I could do with an external tool. I'm toying with the idea of extending this to be an optimizing compiler with at least a list of peephole optimizations which should shave off a bit more space (and a few IPU when running the script). I hand-optimized the assembly for one of the functions and was able get a ~20% reduction in both instructions and bytes, but most of those gains were from eliminating the scope push/pop/local and managing the single parameter just on the stack with DUP and SWAP as needed. I suspect that's probably very near the upper bound of what's even theoretically possible with an advanced optimizer.
Serialization changes
Much easier than instruction gymnastics was taking a look at tweaks to the kOS code. The biggest issue at this point was that I only had 98 values in the arguments section (down from 147 at the start), but references to these arg were all 2 bytes long since the section itself was 673 bytes long. Switching to ordinal indexes rather than byte-offsets was a simple change that saves a bunch of space. (I also fixed the stack marker properly).
New Opcodes
At this point most of the low-hanging fruit had been addressed so gains slowed down. Still, there are plenty of common patterns that can benefit from smarter encoding in the instruction stream. One of the most obvious at this point was duplicate string values for essentially the same variable (" calls don't really need a dummy marker value if they had their own dedicated instruction, so I added CallIndirect and CallDirect. I also stripped all decorators off the identifiers for call operations since they are re-added if missing by the Execute method anyway.
x
vs$x
vs$x*). I added
PushVarand
PushFunc` opcodes to generate the proper version of a given identifier without having to pollute the arguments table. Adding a PushStackMark variant also seemed reasonable as that saves a byte per function call. Speaking of function calls, as far as I can tell the first MLField of the Call instruction isn't actually used anymore. Furthermore, the "Immediate Values
The next big thing cluttering the arguments table was constant values for various instructions. Most of these values are only used once and some are duplicated in the table since scope ids are shorts while branch offsets and return depth are integers which prevents them from de-duping. Branch offsets are typically small signed values so I added BranchIfFalseImm, BranchIfTrueImm, and JumpImm to use a signed-byte after the opcode to store the destination offset when possible. Scope IDs and return depth are usually small non-negative numbers so I did something similar with PushScopeImm, PopScopeImm, and ReturnImm using an unsigned byte in these cases.
Immediate Labels
At this point there were still a few labels left in PUSHDELEGATERELOCATELATER instructions. Really these aren't much different from relative jump offsets, but since they cross code block CodePart boundaries I decided to use an absolute instruction offset in the ksm file for the immediate version (using a UInt16). I also baked the WithClosure flag into the instructions themselves. Finally, I changed the starting "previousLabel" value to "@0000" to make the first "LABELRESET" opcode unnecessary in most cases.
(These were the only instructions that I didn't fully implement since they require translation. The rest have proper Execute methods and should in theory just work, though I was focused on what's possible in a .ksm file and didn't try to run any of the modified versions)
Ideas