Change CONFIG:IPU system to something more fluid and accurate?

In some of my recent testing of the new CONFIG Applauncher Panel I just discovered that if my program does this:

set x to 0.
until false {
  set x to x+1.
  if mod(x,500) = 0 {
    print "reached multiple of 500: " + x.
  }.
}.

(The print statement is there to prove the rate of execution is changing when I change CONFIG:IPU. If I make it higher, the messages print out faster).

Then I can turn up the CONFIG:IPU to a huge number like 5000 and my animation frame rate is STILL just fine! And my computer is sucky for frame rate.

BUT, If I change it from printing on multiples of 500 to printing on multiples of 5, then suddenly I do notice a framerate clunkiness appear at high CONFIG:IPU numbers. The act of printing to the terminal is waaay slower than the act of going x = x + 1. Which makes sense.

So it's clear that different opcodes have wildly varying execution times (which should be intuitive, really, some of them are calling suffixes which are really methods which are walking the parts tree to find an answer, etc, and they still count as just one opcode).

Now, often real operating systems will 'sleep' a process when it tries to do I/O, putting it to the back of the execution queue until the I/O is ready, aborting it early, letting the purely mathy-thinky- processes have a chance, because of precisely this sort of problem.

It occurs to me that kOS would run a LOT more smoothly (and scripts could run quicker) if we instituted a stopwatch-based update interval instead of a number-of-instructions one. If you run a time-consuming opcode instruction, you'll get far less number of instructions that update than if you run nothing but, say, pure arithmetic that update and you could get a really big number of instructions in the same time. Instead of trying to profile each opcode to learn its timing, we'd just start a stopwatch in the CPU.Update() and check it after running each opcode. After the timeslice is expired, then that's the last instruction we'll get this Update and we stop there for next time.

But one problem this could cause is that it would then mean different computers could run more or less code than each other, which can impact the sharing of scripts.

Therefore while this would be trivially easy to implement what I just described, it has other adjunct changes needed along with it that will take a little bit more work:

It would need a way to allow triggers to last longer than 1 update, so we stop having this "entire trigger must finish in one update" rule - that would be the biggest barrier to script portability, as once we are basing the limit on execution time, then different people get different limits on their computer. To keep scripts portable, the trigger body that only took 1 update on your computer has to not crash when it needs 2 or 3 updates on someone else's.

And that also would mean revisiting the earlier idea of a SNAPSHOT or FROZEN block. They would be needed to ensure blocks that the script writer has determined DO need to execute within one single atomic section (so they get a sample of readings from the same moment in the universe) will do so regardless of what it does to the framerate. The atomic sections are for places where causing a jittery framerate by executing too much in one update is far less disastrous outcome than allowing the section to span multiple updates would be.

Another "real" life example. I've wrote two small scripts that calculate deltaV and Isp for current stage at current altitude (at current body). Here's the thing, I'd think, this is very basic stuff, one might need to get to these before doing anything.

Now, I was going to put these in trigger bodies, obviously, but these alone exceed about 450 instructions. I've no idea how. I wrote them in less characters than 450. Did not used recursion or any other than a for loop iterating a list which should have only 4 elements (engines), beside that, there's only math in them.

I don't think this has anything to do with the hardware or operating system I'm running KSP on. Does it? What are these mainlines, after all? I'd bet these are not mnemonic codes. You aren't writing an assembler here, are you? Are these mainlines take roughly equal amount of time to execute? If not they should be filtered or grouped and multiplied by different factors.

Well, until this gets figured out, let's just hope we can land with triggers having less then 2000 instructions. This has a very '60s feel to it. Much more so than limiting storage capacity. You could add this as a feature, just for the sake of good development practice.

Please give me being nozy, and not very well informed.

@sulyi

You aren't writing an assembler here, are you?

We sort of are: one of the most important parts of kOS is a custom virtual machine (i.e. processor emulator) in which all programs run. When you run something it compiles to the bytecode first and then runs. Without this we'd have to parse every line every time we execute it (I believe kOS did it in it's early versions) and parsing is very slow. Note: our bytecode does not depend on the OS you are using.

A conversation from our team's Slack: (relevant to this issue, not to the virtual machine discussion)

dunbaratu [8:23 PM] I'm waffling on whether or not it's a good idea to change how IPU works to be based on a stopwatch instead.

erendrake [8:24 PM] @dunbaratu: i am too. i was on board until we started to get more serious about IPU based power usage

hvacengi [8:24 PM] Oh, I meant more for our debugging purposes. I agree that the stopwatch idea for execution might be a problem.

erendrake [8:25 PM] the idea that you would get more instructions and therefore more power usage on a faster machine seems like a mistake

dunbaratu [8:25 PM] Pro: smoother execution - some instructions take much less time to execute than others, so executing exactly the same number of instructions each time doesn't mean it's the same amount of actual time - with fixed N instructions, some parts of the code will lag the game more than others will.

Con: Inconsistent script behavior on different computers would be a hindrance to sharing scripts. Someone could easily write a script that works fine on their faster computer but fails on a slower machine when someone else tries it, because they didn't know how sensitive their code was to being able to finish a loop in one update, for example. (edited)

erendrake [8:27 PM] and until we can run triggers in more than one update that would be a total barrier to sharing

dunbaratu [8:28 PM] i always viewed being able to split a trigger across more than one update as being a prerequisite to the stopwatch idea.

erendrake [8:28 PM] yeah

dunbaratu [8:29 PM] But it just really bugs me that people are paying exactly the same IPU cost for all suffix method calls, for example, whether they result in one line of C# code or hundreds.

hvacengi [8:29 PM] @erendrake: we still would have that potential for faster machines consuming more power. If you have people like me who max out ipu. Which I want to revisit the max value at some point too, but that's why I was wondering if we had those benchmarking tools. I've since found them (and modified them on my local version) so I'm hoping to be able to get the average update time and max update time to compare them to the time delta.

hvacengi [8:30 PM] @dunbaratu: agreed. I assumed that was part of the reason you liked the idea of a "standard library" for common functions, since those would hit the ks instructions.

dunbaratu [8:34 PM] In order for more to be in a standard library, it would first need classes and not just functions.

But it seems like the only real way to make weighted instruction duration while at the same time keeping things deterministically the same across many machines is to benchmark ALL the routines ourselves and assign them a cost number. That sounds like way more work than I want to do.

dunbaratu [8:36 PM] It's not so much the burden of doing it once that I find distasteful - its the idea of then having to keep it updated properly with every new suffix, every new built-in function, every refactor, etc.

dunbaratu [8:36 PM] Whatever the system is, it must be something that can automate that and "learn" the execution cost dynamically on the fly- not something we have to pre-configure.

dunbaratu [8:37 PM] And I can't think of a way to do that without using real-time clock checks, which have all the cons mentioned above.

hvacengi [8:38 PM] Maybe a floating average based on stopwatches? Make it so you limit the time explicitly on the first run, and see how many instructions get through, then adjust the timing to maintain an average at the ipu limit?

hvacengi [8:38 PM] Waits would really mess that up though.

dunbaratu [8:43 PM] I wonder if there's a way to ask the stopwatch how many of Microsoft's .NET CIL opcodes the current thread has executed rather than how much wall clock time - so that for example a 4ghz machine typically returns numbers twice as big as a 2ghz machine for the same time frame. That way we could count how many of THOSE are passing, and get a uniform result regardless of machine speed. (you are allowed to execute this many CIL opcodes per fixedupdate, rather than this much wall clock time.) That way the slow computer just simulates the universe slower rather than changing the amount of code the in-universe CPU can execute per in-universe second, and the faster computer just has more open idle time for other mods to fill.

hvacengi [8:46 PM] In theory, yeah that should be available. I'd expect it to be in the Diagnostics namespace (which I'm having trouble locating when I walk the available references in Visual Studio's reference manager)

dunbaratu [8:57 PM] Diagnostics is where the stopwatch class is, so I have used it before -but only a bit.

It's still not relevant until farther in the future, though, because being able to span triggers across multiple fixedupdates is first.

hvacengi [8:58 PM] Yep.

hvacengi [8:59 PM] I'm not finding anything helpful in there just from glancing anyways.

dunbaratu [9:00 PM] I already know exactly how I'd do it. It's actually not that hard - I just need the ability to track where in Cpu.FixedUpdate() the run got to when the IPU's ran out, and if it was in the midst of the triggers, just remember to continue next time from where they left off.

The ugly part is that some of those triggers are used by flybywire, and I don't know how to deal with the case where the trigger that updates a flybywire setting gets only halfway done executing.

dunbaratu [9:01 PM] It's almost as if those would need some kind of "dirty" flag to remember if they were in the middle of executing. If they are, then skip the part that updates the KSP flight settings from them - catch it next time.

hvacengi [9:05 PM] I finally found the spot to get the CIL offset information. But it looks expensive. You need to pull a stack trace and then grab the current stack frame which would give you the offset among other items: https://msdn.microsoft.com/en-us/library/system.diagnostics.stackframe_members(v=vs.90).aspx

hvacengi [9:07 PM] But it might not work all that well for ksp in release, apparently the value is approximate when debugging is disabled.

dunbaratu [9:19 PM] ugh - well that doesn't sound viable. Not only is KSP not going to be in debug mode, but also it cannot be using an expensive operation because it will be checking it after every opcode execution.

hvacengi [9:20 PM] Yeah, pretty painful even if you choose to round it out by only checking every 10 opcodes.

hvacengi [9:21 PM] We could do automated bench tests on anything that is entirely within kOS.Safe, but nothing tied to KSP/Unity would work that way.

To sum up:

The problem:

In the current implementation different opcodes take the same im-game "time" to execute which is bad because some operations are actually faster than other.

Solutions proposed so far:

Limit the update time instead of the number of instructions
- pros:
  1. Smoofer execution
- cons:
  1. On faster computers will execute more instructions per update which can cause inconsistent program behaviour on different PC.
  2. If we are going to make power consumption depend on IPU than it will make players with faster computers pay more electric charge.
Benchmark all functions and opcodes ourselves and assign an execution cost
- pros:
  1. Smooth execution
  2. Consistent in-game execution speed between different PC
- cons:
  1. There is no easy way to automete benchmarking of functions that are not in kOS.Safe.
  2. [not mentioned in the dialog] benchmarking will only be correct for functions with O(1) time complexity i.e. constant execution time. But some functions' (e.g. list:CONTAINS(item)) execution time vary.
Count the number of .NET opcodes executed
- pros:
  1. It would be nice but...
- cons:
  1. .NET doesn't offer a fast way to get this value
Floating average based on stopwatches (@hvacengi's proposal)
- Sorry. I didn't get the point. What exactly are you suggesting to do?

I have an idea how to solve this problem:

Create a dummy function that does nothing but takes some time to execute e.g. an empty for loop.
Invoke and time this function every N updates to get the idea of how fast the player's PC is.
Make the update to last K * dummy_execution_time P.S. Instead of making a dummy function we can time something useful but then changes that make this function faster/slower would affect the execution speed.

@abenkovskii you phrased my suggestion well, I'd be happy to explain my thoughts.

At start up, kOS would operate much the same way it does now. It would run through a fixed number of instructions, and time the operation.
As future updates continue, we calculate the average time it takes to run instructions (long and short duration alike).
Using this average, the IPU is adjusted to maintain a relatively consistent target update time
The running average continues to be updated while the program runs, and the IPU continues to be adjusted.

We already use stop watches to monitor performance if the stat config flag is true, so this would just require modifying that level of code to some degree.

There are a few spots where we would need take particular care. Obviously, compiling a program should not be counted as one instruction in the average. There may be other infrequent but expensive operations that we would want to exclude from the timing.

It does not however solve the issue of scripts running faster on some computers. But in theory, we already have that problem. I run my IPU maxed out with no adverse effects on game play, which is 10 times the default setting.

@hvacengi Now I'm 100% sure that it's not the same idea as I suggested. Why do you think it's better than just limiting the update time directly?

I just opened issue #1385 which moves us in a slightly different direction for the time being. Since there has been no movement on this issue since May, I think I'm going to close this issue for now. We can reopen it if/when we want to revisit this bench marking idea.

This issue is quite different from #1385. This is all about weighting instructions so they don't all cost exactly 1. #1385 is about making it possible for different parts to have a different value for what that limit of that cost per update is. They're completely orthogonal issues. This is about variety per instruction. #1385 is about the variety per part.

KSP-KOS / KOS