Should report meaningful line number to the user on runtime C# exceptions

KSP-KOS / KOS

Fully programmable autopilot mod for KSP. Originally By Nivekk

Other

697 stars 230 forks source link

Should report meaningful line number to the user on runtime C# exceptions #186

Closed Dunbaratu closed 10 years ago

Dunbaratu commented 10 years ago

When a runtime C# exception is generated because of an inability to cast from one type to another, the message the user gets is useless to figure out where the problem is. They have to trace it down by spamming print statements throughout to find it. They see nothing about what the variables in question are, or the script line number is.

The problem is that the entire system the C# exception handler uses to report the problem reports it in terms of the C# code, not the kos code. So the stack trace isn't helpful at all and we can't expose that to the user.

So how should we report it?

I have a suggestion but I don't know if it's practical.

The suggestion is this:

When the script is being compiled, create a lookup table that records which block of kRISC instruction pointers correspond to which line numbers of the script. It might look something like this:

script line	kRISC start IP	kRISC stop IP
1	0	5
2	6	15
5	16	18
6	19	23

etc...

Store that table with the compiled program.

Then whenever an exception is thrown, as part of printing the error message to the screen, you can check that table and use it to map from the current instruction pointer to the script line it came from, and print a message with filename and line number for the user to see.

teleksterling commented 10 years ago

Would this method still work with sections of code that get called on events/updates? i.e. ON, LOCK(?) , ...

Dunbaratu commented 10 years ago

I originally thought of describing the table as just containing two columns: script line and START IP, with the idea that the range is implied as "everything up to the start of the next START IP on the next row". But then I thought about the possibility of there existing kRISC instructions NOT directly derived from any particular script line, and if they exist they might lay outside the start/stop ranges of any row of the table, as a way of indicating this so the error message doesn't come out claiming a problem in a line that doesn't have the problem. If there are kRISC instructions that can't be associated with any line of code, that's how they'd be represented. I don't think ON or LOCK would be that way, however, because they create functions, essentially, that can be marked as coming from that line of code. (So for example, if the flight system encounters a runtime error while evaluating the expression the user wrote in LOCK STEERING, the user could get the error message pointing to the line of code that the lock expression was compiled from.

It does occur to me that for loops you could have one line of script code evaluate into two disjoint sections of kRISC code - the stuff executed at the top of the loop and the stuff executed at the bottom of the loop. To deal with this case properly would mean the table's main key would have to be NOT the script line number, but the kRISC start/stop blocks. (that way more than one block can be part of the same line, but more than one line can't be part of the same block.)

Thinking of it like a database, kRisc start IP and kRisc stop IP would be a combined primary key for the row, with script line number being the data stored at that row. (obviously it would be stored using one of the c# collections, not a database, but the database terminology is useful to describe what I meant.)

Dunbaratu commented 10 years ago

As I started looking into this today, I've come to the conclusion that probably the easiest way to implement this is to just simply add a new field to the Opcode class - a simple integer for the script line number the Opcode came from.

That's because the line numbers won't map to sequential contiguous blocks of Opcodes in the kRISC program given that order of operations will shuffle things around a bit. For example:

set x to    // line 1
1           //line 2
+           //line 3
4           //line 4
*           //line 5
2.          //line 6

The opcodes are going to come out in this order: line 1 (push $x) line 4 (push 4) line 6 (push 2) line 5 (mult) line 2 (push 1) line 3 (add) line 1 (store)

(note how the opcode-to-line number mapping is neither contiguous nor sequential, making my earlier design idea a bad one. The only way it works with this re-arrangement of order that the rules of expression evaluation demand is to store a line number per each Opcode, instead of trying to map each line number to a contiguous sequential range of opcode instructions)

I had tried avoiding this with my earlier design because I was operating under the assumption that Opcodes were very small classes with very few data fields in them - just an ID for type of operation, and a field for the operand. Upon starting to implement this, I realized they actually already have quite a few data fields, so that adding one more int to each one won't really be a big deal.

Dunbaratu commented 10 years ago

@marianoapp, I'd like to ask you a question about how you implemented the kRSIC virtual machine opcodes.

Is it true that Opcode.InstructionId is a number that corresponds to a single compiled program file? This seems to be the case from how it's implemented but I'd like some assurance that this is its intended meaning before I make use of it.

In other words, If programX runs programY, which then runs programZ, then there is one shared ProgramContext for all three of these, concatenated together, in which:

All the Opcodes that were generated from programX's locks, triggers, and mainline code all have InstructionId=1.
All the lock, trigger, and mainline Opcodes that were generated from ProgramY have InstructionId=2.
All all the lock, trigger, and mainline Opcodes that were generated from ProgramZ get InstructionId=3.

This seems to be the case, but I was looking for an assurance that this is what was meant to be the case and that it was the meaning of InstructionId (it's hard to guess the difference between intentional functionality versus accidental functionality when reading code that doesn't have a lot of comments.)

If this is intentional and correct, then this greatly simplifies my task, because it means I can generate a simple lookup table from those InstructionId's to the programs' filenames. Then when a runtime error happens I can look at the Opcode's InstructionId, look up which filename that instructionID came from, and use that to print to the user an accurate report of which program file the offending opcode was inside of. (That combined with adding the sourceLine to Opcode means I can make runtime error reports that specify exactly the filename and line number of the error.)

(Edit: I am also considering just entirely replacing Opcode.InstructionId with a string instead of an int, and just storing the filename directly in it. This doesn't waste as much space as it might seem since each string would really just be a reference pointing to one single instance of the filename string, which I would force to happen with String,Intern().)

Dunbaratu commented 10 years ago

I went ahead and started implementing it as described above. It's almost done:

Here's a screenshot of the way it formats runtime errors:

screenshot73

And here's an example of a verbose dump of part of the program code, with the new data I'm tracking being shown (it may look wasteful to store the filename many times, but remember that I'm using the String.Intern call to make it so it's only storing references to the same string over and over):

Code Fragment
File                 Line:Col IP   opcode operand
----                 ----:--- ---- --------------------- 
1/testcode2             3:1   0054 return 
1/testcode2             3:1   0055 push $program-testcode3* 
1/testcode2             3:1   0056 push 0 
1/testcode2             3:1   0057 store 
1/testcode2             3:1   0058 call 45 
1/testcode2             5:5   0059 push $a 
1/testcode2             5:10  0060 push $target 
1/testcode2             5:17  0061 push distace 
1/testcode2             5:17  0062 getmember 
1/testcode2             5:27  0063 push $b 
1/testcode2             5:25  0064 add 
1/testcode2             5:29  0065 push 1 
1/testcode2             5:28  0066 add 
1/testcode2             5:5   0067 store 
                        0:0   0068 return 
1/testcode3             3:1   0069 push $v2 
1/testcode3             3:1   0070 return 
1/testcode3             3:12  0071 call $v1* 
1/testcode3             3:15  0072 push 2 
1/testcode3             3:14  0073 mult 
1/testcode3             3:14  0074 return 
1/testcode3             3:1   0075 push $v2* 
1/testcode3             3:1   0076 push 69 
1/testcode3             3:1   0077 store 
1/testcode3             3:1   0078 push $v2* 
1/testcode3             3:1   0079 push 71 
1/testcode3             3:1   0080 store 
1/testcode3             5:5   0081 push $b 
1/testcode3             5:10  0082 call $v2* 
1/testcode3             5:22  0083 push 2 
1/testcode3             5:22  0084 call arccos() <<--INSTRUCTION POINTER--
1/testcode3             5:13  0085 mult 
1/testcode3             5:5   0086 store 
                        0:0   0087 return

The only thing still left to do before I commit the changes is see if I can have it dump out a trace of the stack of subroutine jumps (runs) rather than just the current offending line (i.e. this was called from testcode2, which was called from testcode).

Dunbaratu commented 10 years ago

Here's a test example using my new pull request: screenshot76

These were the 3 small programs that called each other for the test:

Program file: testcode

set x to 1.
set y to 2.

print "hello".
run testcode2(x,y).
print "hola".

print a.

set z to y / 0.
print target:mass.

Program file: testcode2

// This is testcode 2.
declare parameter p1,p2.
run testcode3(5).
set a to p1 + p2 + b.

Program file: testcode3

// this is testcode3
declare parameter p1.

lock v2 to p1/0.

set b to v2. // This is the point where I expect an error-
            // when the reference to v2 causes the lock expression
            // to get called, and the lock expression has a divide by zero.

teleksterling commented 10 years ago

Looks fantastic! The rest of my comment was going to be about the order of the 'called from' stack unwinding. But I see that you've updated the screenshot with the reverse order (compared to the version I had in my email notification. I really like this. It will make debugging kOS much more friendly.

Oh, one more comment - I don't know if anyone else felt the same way, but I got confused at first with the format 1/testcode - 1 of course being the storage ID number. I think just because it naturally doesn't look like a folder or drive ID. though I'm not sure I would propose anything else. Maybe /1/testcode consistent with Unix format? or Storage1/testcode? Just throwing it out there.

Dunbaratu commented 10 years ago

I was trying to imagine what character would clearly never be used in a real filename (for the sake of archive files which map to actual real on-disk filenames), so I could use it as a separator, and it occurred to me that I should just use the directory separator itself - the '/'. Now, I know that that's a Unixy way to think and maybe the Windows way is to use "\", but with the advent of so much internet URL stuff having made its way into Windows, I figure Windows probably won't use "/" for a filename either.

Dunbaratu commented 10 years ago

The next edge case I want to work on dealing with with this is the case where the caller of the offending code isn't actually kRISC compiled code at all, but rather is some built-in part of kOS that called the function from Csharp code. For example, when a LOCK expression was called by the fly by wire updater. The fly by wire updater isn't kerboscript code, and isn't represented in the program's kRISC anywhere. Therefore currently it wouldn't show up in the call stack.

In other words, what I'd like to see is something like this:

LOCK THROTTLE TO arccos(2). // it's an error to take the arccos of 2 - there is no such thing.
wait 0.01.
print "hello".

generate an error like this:

Tried to push NAN into the stack
At 1/myprog line 1.
LOCK THROTTLE TO arccos(2). // it's an error to take the arccos of 2 - there is no such thing.
                       ^
Called from internal-flight-control-updater

Getting that context (called from internal-flight-control-updater) may mean some weird trickery, since it won't be on the call stack as a "real" context record, since it's information that exists entirely outside of the kRISC program.

Dunbaratu commented 10 years ago

My latest additional commit added the ability to trace runtime errors from the interpreter and report them as well (i.e. go back in history and find the LOCK statement that had an error that just got triggered when its value was being used in a later statement).

Here's a screenshot showing that:

screenshot80

I also changed the formatting of reporting the volume and filename to make it easier to understand (now it's "filename on volume" instead of "volume/filename".)

Here's a screenshot showing that:

screenshot81

I still haven't quite worked out how to get the report to tell me "Called from Binding Preupdate" or something like that for the flybywire stuff. That's a bit messier because that 'return' back to the Update() code isn't represented in kRISC code. and isn't on the CPU call stack.

Dunbaratu commented 10 years ago

Woo hoo... Now the lastest commit I added to the pull request also reports when the code in question was called by kOS's automatic updater (the cooked flight controls that re-evaluate their locks every Update).

It wasn't as messy as I thought because it turns out there is kRISC code being used for these - the compiler inserts a TRIGGER for each of them, and the opcodes of those triggers were places I could mark the code as having come from somewhere other than the user's source file.

Here's a screenshot of this feature in action, both from a program, and from the interactive prompt:

screenshot82

I now consider the pull request ready for other people to play with and test. I've added all the features I wanted to add, and they all seem to work when used with primitive little tests. Now to beat on it with bigger tests.