Discussion ideas: Functions and variable scope

Dunbaratu commented 10 years ago

This is a sort of issue we've been leaving aside to "get to later" for a long time but soon it will matter so I think I'll start proposing some ideas and ask for discussion.

Current situation: While there are 'secret' functions implemented in kOS script (the LOCK expression is a function behind the scenes), there's no way for a script to create its own homebrewed function call. To make functions you have to make a program script that acts as a pseudo-function, and do everything with global variables. The only thing approaching a local variable that exists now is the parameters you pass into programs.

What would be preferred: You can declare a function, and it can have local variables in it, and it can have a return value. Try not to break backward compatibility too much (this is the tricky part, as all existing code will assume variables are global).

(Note, once you can make functions we may even talk about user-made classes with MEMBER functions (methods), but first off I'm just talking here about vanilla global functions and that's it.)

Here's what I'm tossing around in my head:

Remembering that programs in kerboscript are compiled BY running them, and thus behave much like a just-in-time script compiler in a language like Perl or Python, the declaration of a function will in fact be a command that, until the program is RUN, doesn't exist. i.e. if you have myfunc() declared inside the program myprog.txt, then you cannot call myfunc() until after you've run myprog. Thus if you want to create a library of functions to call, you can just make a myprog.txt that contains nothing more than function declarations (no mainline global code) and run it up at the top of your program to get those functions loaded up.

The syntax of a function should be script-like and as similar as possible to what's there now, something like this:

FUNCTION myfunc 
{
}.

To keep in keeping with current kosscript practice, the arguments would be late-binding and thus the function has no prototype parameters (thus you can't have a myfunc(string) and a myfunc(int) be two different functions. I don't want to deal with tracking function lookups by prototype).

instead, just like with a program - you declare arguments inside the body of the braces, like so:

FUNCTION myfunc
{
    DECLARE PARAMETER x,
    DECLARE PARAMETER y,
    DECLARE PARAMETER z,
    // or all in one line as DECLARE PARAMETER x,y,z

    SET dist to sqrt( x^2+y^2+z^2 ).
    RETURN dist.
}.

// An example of calling it, either it would be called like this:
SET THEDIST TO myfunc(10,20,-12).
// Or more explicitly like this:
SET THEDIST TO CALL myfunc(10,20,-12).

The compiler, when seeing this, compiles the stuff inside the braces and puts it inside a subroutine call, using the standard OpcodeCall and OpcodeReturn. Look to the code produced by LOCK as a guide. It leaves the list of opcodes there at the top of the program to remain after the program is run, just like it does for the rest of the program (so it doesn't have to compile the program again on a subsequent run, it always leaves the compiled version in memory afterward).

local variable scope?

For functions to be truly useful they need local variable scope. But kos doesn't require you to declare variables and you can just implicitly declare them the first time you use them with SET.

So how to make local scope for variables without altering this?

I propose this: All code that exists OUTSIDE functions still operates like it always did before - variables created there are global. But when a variable is created inside a function, either explicitly with DECLARE or implicitly with SET, it gets local scope instead.

Scope hiding? : What if there's a SET X TO BLARG done globally, and then a function is called that does SET X TO FOO locally? Is the function trying to change the caller's global X, or its it trying to make its own local X? I'd say that the function is assumed to be trying to use the local X unless it explicitly says otherwise (we make a keyword like GLOBAL: and so they have to say GLOBAL:X to get the non-local X). Remember that backward compatibility isn't an issue here because there won't BE any functions until we implement this so any code inside functions can be forced to operate under newer, better rules.

Nested scopes inside braces? : When you have scope nesting in a sophisticated modern language, it lets you do this:

while something
{
  int x;
  if something
  {
    int x; // a different x
    while something
    {
      int x; // yet another different x.
    }
  }
}

Those are 3 different nesting levels of X. But if we tried that in kos script it would get messy because of the IMPLICIT declarations of X that occur when you try to SET it. If we tried implementing this in kOS we'd get only 1 X if we did this:

while something
{
  set x to 0.
  if something
  {
    set x to 1.
    while something
    {
      set x to 2.
    }
  }
}

but then get 3 different x's, like in modern systems, if we did this:

while something
{
  declare x.
  set x to 0.
  if something
  {
    declare x.
    set x to 1.
    while something
    {
      declare x.
      set x to 2.
    }
  }
}

I'm not sure this is wrong. It's just weird, and flies in the face of good strict structure practices that say people will make big mistakes if you allow implicit globals (implicit variable declarations is one of the reasons Perl is a mess to read, and why they instituted the use strict directive if you wanted to throw away the old way and be forced to declare everything properly.)

But what about pass-by-value versus pass-by-reference?

This is actually a bit uglier than you might think. The objects that correspond to the variables in kOS are actually Csharp objects and thus the follow the often used OOP pattern of passing primitives by value, but passing complex objects by reference. Unless we implement a copy constructor for all of the kOS objects, we can't really enforce that all parameters are pass by value. Not really. This is a case where the underlying behavior of the implementation language (csharp) gets exposed to the hosted language (kerboscript) and I'm not entirely sure how to best avoid that, or even whether it's desirable to. But having to tell users "if you pass a SUFFIXED type the function is operating on the same copy as the caller, but if you pass a number type then it's not" is a bit icky.

Implementation

The ProgramContext here: https://github.com/KSP-KOS/KOS/blob/develop/src/Execution/ProgramContext.cs would be expanded to include a Dictionary a bit like the global one, that maps program names to the objects.

ProgramContext's exist on the call stack at runtime. The idea would be that you push a new one whenever you start a new function body, and pop it when you return.

But UNLIKE now, there would be a little difference: the Dictionary would map a variable name not directly to its object, but to a STACK of its objects. By default you're always getting or setting the object on top of the stack of objects with that variable name. But when you enter the braced-scope in which a new local variable name masked the existing object, you'd push the new value on top of that stack, be operating on it instead, and then when you exit the braced-scope where the masking occured, you'd pop it back to the value it had before you began.

The keyword GLOBAL: would instruct the system to jump down to the global ProgramContext and use it's dictionary instead. However it would usually not be needed. It's only for when you mask a global name with a local name and want the global name instead. When you haven't done that, then you will get the globals automatically by just the fact that we'll have the variable lookup algorithm try the local scope dictionary first, then fallback on trying the global one. (It's not going to try visiting all the ProgramContexts on the stack before that, though, becuase if global calls A calls B calls C, then it's appropriate for C to see the globals, but not appropriate for C to see B's variables or A's variables.)

Note there are two levels here, quite on purpose. The ProgramContexts on the stack are for the function's name scope. The stack of values for each variable in that ProgramContext would be for the runtime manipulation of variables that mask variables. They are quite different operations because one is a lexical scope where the higher calling level is unknown (the same function could be called from two places and you do NOT want functionA to see the variables of functionB that called it), while the other is a truly stacked runtime operation (the scope for each squiggle brace).

madlemur commented 10 years ago

This seems to imply recursion. If so, I would think you may want to consider a memory abstraction (stack limit, or some such), so that a runaway loop merely crashes kOS, not KSP. Or have I just not scanned the code carefully enough?

Dunbaratu commented 10 years ago

That would be correct - it would need some sort of limit to prevent infinite recursion. Obviously everything already is a memory abstraction. We're not peeking and poking bytes into ram in a C# program. But you're right that the abstraction model would need some sort of sane upper limit that it would detect and stop when the stack got too large.

cherrydev commented 9 years ago

I think it might be better to simply adopt the scoping rules that many common scripting languages use: If you want a local variable, you have to EXPLICITLY declare it, and then use a different declaration syntax compared to declaring a global variable. Use a LOCAL keyword instead of DECLARE, for example. Locally declared variables would then always shadow global variables, or local variables declared in a higher lexical scope. I don't like the idea of the behaviour of DECLARE changing depending on the context, since it could make code more difficult to read and therefore more prone to errors.

cherrydev commented 9 years ago

Any comment?

KSP-KOS / KOS

Discussion ideas: Functions and variable scope #225

Here's what I'm tossing around in my head:

Implementation