NeilFraser / JS-Interpreter

A sandboxed JavaScript interpreter in JavaScript.
Apache License 2.0
1.98k stars 353 forks source link

Call pseudo function from native function #199

Open Webifi opened 3 years ago

Webifi commented 3 years ago

Edit: There is a pull request for this now, ( #201 )

Trying to figure out how to implement calling an interpreted function from a native function, and be able to pass arguments to that function without having to essentially serialize the argument values and pass them back through .appendCode.

After reading previous issues, #130, #153, etc., it appears that what I'd like to do is non-trivial, but I'd still like to give it a shot. Problem is, I'm not sure where to start.

An old pull request, #102 , seems close to what I'd like, but unfortunately the version of interpreter.js at that time is so different from the current that I can't even use the PR as a guide.

Any pointers on how to build a valid Interpreter.State from a Function object passed to a native function, and pass some arguments to that Interpreter.State? Or, if that's the wrong approach, what's the correct one?

Webifi commented 3 years ago

I've modified interpreter.js to add methods for directly calling pseudo functions from native code.

Here's the pull request for it: #201

Examples:

Synchronous callback from native function:

function nativeFunction (func) {
    return interpreter.callFunction(func, this, 25).then(function(v) {
        console.log('got psudo result', v)
        return v + 5
    })
}

// Called via interpreted code
var test = nativeFunction(function(v){return v + 2})
// Should produce 32

Synchronous callback from native AsyncFunction:

function nativeAsyncFunction(func, callback) {
    callback(interpreter.callFunction(func, this, 25).then(function(v, callback2) {
        console.log('got psudo result', v)
        callback2(v + 5)
    }))
}

// Called via interpreted code
var test = nativeAsyncFunction(function(v){return v + 2})
// Should produce 32

Queued via queueFunction: (func is called last.)

function nativeFunction(func) {
  interpreter.queueFunction(func, this, pseudoarg1, pseudoarg2);
}

(See: #201 for more)

Please let me know if there's a better way to do this.

Webifi commented 3 years ago

Here's a full example of how to implement native timers (setTimeout, etc.) using PR #201

const interpreter = new Interpreter("", (interpreter, globalObject) => {
  const timeouts = {};
  let timeoutCounter = 0;

  const intervals = {};
  let intervalCounter = 0;

  const frames = {};
  let frameCounter = 0;

  interpreter.setProperty(
    globalObject,
    "setTimeout",
    interpreter.createNativeFunction(function (fn, time) {
      const tid = ++timeoutCounter;
      const _this = this;
      timeouts[tid] = setTimeout(function () {
        if (timeouts[tid]) {
          delete timeouts[tid];
          interpreter.queueFunction(fn, _this);
          interpreter.run(); // Keep running
        }
      }, time);
      return tid;
    })
  );

  interpreter.setProperty(
    globalObject,
    "clearTimeout",
    interpreter.createNativeFunction((tid) => {
      clearTimeout(timeouts[tid]);
      delete timeouts[tid];
    })
  );

  interpreter.setProperty(
    globalObject,
    "setInterval",
    interpreter.createNativeFunction(function (fn, time) {
      const tid = ++intervalCounter;
      const _this = this;
      intervals[tid] = setInterval(function () {
        interpreter.queueFunction(fn, _this);
        interpreter.run(); // Keep running
      }, time);
      return tid;
    })
  );

  interpreter.setProperty(
    globalObject,
    "clearInterval",
    interpreter.createNativeFunction((tid) => {
      clearInterval(intervals[tid]);
      delete intervals[tid];
    })
  );

  interpreter.setProperty(
    globalObject,
    "requestAnimationFrame",
    interpreter.createNativeFunction(function (fn, time) {
      const tid = ++frameCounter;
      const _this = this;
      frames[tid] = requestAnimationFrame(function () {
        if (frames[tid]) {
          delete frames[tid];
          interpreter.queueFunction(fn, _this);
          interpreter.run(); // Keep running
        }
      }, time);
      return tid;
    })
  );

  interpreter.setProperty(
    globalObject,
    "cancelAnimationFrame",
    interpreter.createNativeFunction((tid) => {
      cancelAnimationFrame(frames[tid]);
      delete frames[tid];
    })
  );
});

interpreter.appendCode(`
  var interval = setInterval(function() {
    console.log('Yay! Intervals!');
  }, 1000);
  setTimeout(function() {
    console.log('Yay! Timeouts!');
    clearInterval(interval);
  }, 5000);
`);
interpreter.run();
cpcallen commented 3 years ago

Hi Webifi,

Great that you've been so busy making improvements to one area of JavaScript Interpreter that could really use some attention!

I'm going to try to address your work as follows: first I'll make some comments on this bug about the overall issues and possible approaches to solving them, then I'll review the actual code you've submitted in PR #201, initially looking at a high level and, if/when the overall approach is good, also nit-picking anything that would need to be fixed before accepting it. (No guarantees about the latter, unfortunately: although Neil and I work quite closely on other projects, this one is entirely his.)

More to follow.

cpcallen commented 3 years ago

Some background, much of which you evidently already aware of, but which I want to lay out just so we are on the same page (and for the benefit of anyone else reading this bug).

Callbacks in JavaScript

Ignoring completely for the moment JS Interpreter and it's slightly quirky terminology, but instead just looking at the JavaScript language itself, there are two quite distinct kinds of callbacks:

Callbacks in JS Interpreter

There are three sort of callbacks to discuss: the two above, and, separately, the special AsyncFunction callbacks.

Synchronous callbacks

As presently implemented, JS Interpreter makes it easy to write synchronous callbacks from interpreted code to interpreted code, or from interpreted code to native code: you don't need to do anything special, you just call the function in the usual way. See the implementation of Array.prototype.sort, which is a simple interpreted polyfill.

As you know, it does not provide any (straight forward) way for native code to call interpreted code. Unfortunately there is no trivial solution here; because of the step-by-step nature of the interpreter, any native function calling interpreted code (i.e., a pseudofunction) will need to:

This is all possible to do at the moment on a one-off basis, but the hackery involved is pretty awful so it would be great to close this bug by providing some nice, straight-forward (and well-documented) mechanisms for doing all that—which is exactly what you appear to be trying to do in #201.

Without yet having looked at your code in that PR, my general suggestion would be to take an approach similar to the one I have taken in the Code City interpreter (which is based on JS interpreter), specifically:

Then you can write native functions that call interpreted code and though they will end up being hard-to-read state machines, they will be able to successfully call interpreted code (and—faint praise!—be no more incomprehensible than the step functions are!)

Asynchronous callbacks

JS Interpreter also provides no mechanism for doing asynchronous callbacks, but fortunately this is much easier: you just need:

Native functions that create async callbacks will remain nice and straight-forward: they don't need to be written as state machines, because they will return long before the callback is ever run. (Of course they also don't get to find out the return value of the callback, but such is the nature of life in a universe with unidirectional time.)

PR #102 is a pretty good example of one way this might be done, albeit that there are few issues that need to be thought through:

AsyncFunction callbacks

I mention these here mainly to make sure everyone is clear that these are quite distinct from the above. These "AF callbacks" are callbacks only from the point of view of the embedder (i.e., the person writing the program that calls new Interpreter): from the point of view of the interpreted program they do not exist at all—they are merely an implementation detail of certain built-in functions that would traditionally have a non-blocking async API but which can, thanks to this bit of magic, instead have a blocking synchronous API.

One important point about these, however, is that there should only ever be at most one outstanding AF callback pending. This is because, when an AsyncFunction is running no other interpreted JavaScript code should run, so there is nowhere that another AsyncFunction could be called from. This is just like regular non-async NativeFunctions.

(One small caveat: there's no problem in principle with an AsyncFunction making a synchronous callback to interpreted code (or scheduling an ordinary async callback) before running whatever native async function it wraps, or after the underlying native async function has called its callback but before the AF callback is invoked to terminate the AsyncFunction, although there is a problem that any such synchronous callbacks wouldn't actually run because the interpreter is paused. This issue can probably be ignored, because there are vanishingly few cases where someone authoring a native function would want to both call interpreted code synchronously and wrap an underlying async function in a way that makes it look synchronous to the interpreted code, and the changes outlined above to implement synchronous callbacks should make it easy enough to do either one without the special machinery provided by createAsyncFunction; see for example Code City's xhr function which wraps node.js's natively-async http.get API in a way that makes it look synchronous.)

cpcallen commented 3 years ago

@Webifi: to try to answer some of your specific questions:

Any pointers on how to build a valid Interpreter.State from a Function object passed to a native function, and pass some arguments to that Interpreter.State?

You'll need to look at the (horribly, horribly long and hairy) stepCallExpression method to verify the details, but in outline:

For a synchronous call this will then be pushed on to the top of the stateStack, or for an async callback it will be (somehow, see previous comment for lack of detail) attached to the bottom of the stateStack to be run once the current Program node is complete.

Or, if that's the wrong approach, what's the correct one?

That is absolutely the right approach, but of course only part of it.

Please let me know if there's a better way to do this.

Looking at your examples only (not yet the implementation):

Immediate via callFunction: (Unable to return results from async functions.)

function nativeFunction(pseudoFunc) {
  const pseudoResultValue = interpreter.callFunction(pseudoFunc, this, pseudoarg1, pseudoarg2);
  if (interpreter.isPaused()) {
    console.warn("Encountered async function.  Won't have results until async function completes.");
  }
  console.log('Got result:', pseudoResultValue);
  return pseudoResultValue;
}

Since the interpreter is (intentionally) not reentrant, .callFunction() should not be calling .step() or .run(), so the results of calling pseudoFunc will never be available yet, even if it is not an AsyncFunction.

This nativeFunction will need to be written as a state machine; see for example the implementation of Array.prototype.toString in Code City; which I will here simplify for readability:

  function toString(thread, state, thisVal, args) {
    if (!state.info_.funcState) {  // First visit: call .join().
      state.info_.funcState = true;
      var obj = thisVal;
      var func = obj.get('join');
      // Begin bit that should be encapsulated by your callFunction method.
      var newState = Interpreter.State.newForCall(func, thisVal, []);
      thread.stateStack_.push(newState);
      // End bit that should be encapsulated.
      return Interpreter.FunctionResult.CallAgain;
    } else {  // Second visit: return value returned by .join().
      return state.value;
    }
  }

This long-winded thing could have been pollyfilled as function toString() { return this.join.call(this); };

(Actually, I've just realised that, because Array.prototype.toString does nothing after calling this.join except return the result of the join call, I can take advantage of a neat feature I'd forgotten about to make it simpler in this particular case.)

Asynchronous via callAsyncFunction: (Supplied callback is called when pseudoFunc has completed.)

function nativeAsyncFunction(pseudoFunc, callback) {
  interpreter.callAsyncFunction (
    fn => {callback(fn())}, 
    pseudoFunc,
    // ...
  );
}

I must admit that I am not entirely sure what the motivating use-case for this would be, but if this was a common pattern I note that, in the state-machine example above, the call to fn() could have been inserted just before return state.value.

Queued via appendFunction: (pseudoFunc is called on next interpreter step.)

function nativeFunction(pseudoFunc) {
  // ... omitted
  if (interpreter.runUntil(currentStateIndex )) {
    // ... omitted
  }
}

Not sure what the intention here is, but calling Interpreter.prototype.run (or .step) from within a native function is never legal.

Queued via queueFunction: (pseudoFunc is called last.)

function nativeFunction(pseudoFunc) {
  interpreter.queueFunction(pseudoFunc, this, pseudoarg1, pseudoarg2);
}

This looks like a straightforward asynchronous callback. Looks good to me, with two minor nit-picks:

Webifi commented 3 years ago

@Webifi: to try to answer some of your specific questions:

Thanks for taking the time to look things over.

Since the interpreter is (intentionally) not reentrant, .callFunction() should not be calling .step() or .run(), so the results of calling pseudoFunc will never be available yet, even if it is not an AsyncFunction.

There's the rub. Not being reentrant is what makes it difficult to get a value back out of the interpreter. I made it basically reentrant by recording the current state index, adding an additional call state, then stepping until we're back to the recorded index. Looks like I'll need to approach it a different way.

I must admit that I am not entirely sure what the motivating use-case for this would be, but if this was a common pattern I note that, in the state-machine example above, the call to fn() could have been inserted just before return state.value.

That wouldn't return the value of the pseudo function, that could end up using a native async function, for use in the native async function. My example didn't make that use case very clear.

This nativeFunction will need to be written as a state machine;

[Edit] See below...

Webifi commented 3 years ago

Okay, to no longer be reentrant, I've modified the Synchronous callbacks to use something more analogous to a Promise.

Examples of use:

Synchronous callback from native function:

function nativeFunction (func) {
    return interpreter.callFunction(func, this, 25).then(function(v) {
        console.log('got psudo result', v)
        return v + 5
    })
}

// Called via interpreted code
var test = nativeFunction(function(v){return v + 2})
// Should produce 32

Synchronous callback from native AsyncFunction: (That makes my brain hurt.)

function nativeAsyncFunction(func, callback) {
    callback(interpreter.callFunction(func, this, 25).then(function(v, callback2) {
        console.log('got psudo result', v)
        callback2(v + 5)
    }))
}

// Called via interpreted code
var test = nativeAsyncFunction(function(v){return v + 2})
// Should produce 32

In both cases above, the .then(...) callback value handler is optional. If omitted, the value of the called interpreted function will be returned.

Additional pseudo functions can be called by simply returning another Callback in the .then() handler via return interpreter.callFunction(...)

For example:

function nativeAsyncFunction(func, func2, callback) {
    callback(interpreter.callFunction(func, this).then(function(v, callback) {
        callback(interpreter.callFunction(func2, this).then(function(v2, callback) {
                callback("I'm done with " + v + " and " +  v2)
            }))
    }))
}

or:

function nativeFunction(func, func2) {
    return interpreter.callFunction(func, this).then(function(v) {
        return interpreter.callFunction(func2, this).then(function(v2) {
                return "I'm done with " + v + " and " +  v2
            })
    })
}

In addition, I added the ability to easily throw exceptions from Asyn Functions, so could close #178 & #189 if accepted:

Throwing exception in native AsyncFunction:

function nativeAsyncFunction(val, callback) {
        if (val < 2) return callback(interpreter.createThrowable(
            interpreter.RANGE_ERROR,
           'Value must be greater than 2'
        ))
        callback(val + 2)
}

And as a side effect, added an additional way to throw exception from native functions:

Throwing exception in native function: (Alternate to interpreter.throwException(...))

function nativeFunction(val) {
        if (val < 2) return interpreter.createThrowable(
            interpreter.RANGE_ERROR,
           'Value must be greater than 2'
        )
        return val + 2
}

And added catch to function calls:

Catching exceptions in pseudo functions calls from native:

function nativeFunction(func1, func2) {
   // Will be called later
   interpreter.queueFunction(func2, this).then(val => {
    console.log('func2 returned:', val);
  }).catch(e => {
    console.log('Got an error in func2:', interpreter.getProperty(e, 'message); e);
  });
  // Will be called on next step
  return interpreter.callFunction(func1, this).then(val => {
    console.log('func1 returned:', val);
  }).catch(e => {
    console.log('Got an error in func1:', interpreter.getProperty(e, 'message); e);
  });
}

(See detailed examples in #201 thread)

Names for the methods may need to change. For example, "callFunction" probably should be something like "createCallback", and "queueFunction" could be changed to "appendCall". Then there's the arguments for the interpreted callbacks. I currently just use variable arguments, but perhaps others prefer an array?

Webifi commented 3 years ago

@cpcallen

  • It shouldn't matter whether pseudoFunc is an interprerted or native function, so I'd probably just call this argument func.

cache invalidation and naming things...

I've used "pseudo function" to refer to both interpreted functions and native functions wrapped in a FUNCTION_PROTO. I'm uncertain what's the correct, terse, way to refer to them. For attribute names, perhaps just 'func', as you recommend, is best, since I usually use 'fn' for native JavaScript functions, but when referring to sandboxed functions in code comments? Maybe "sandboxed function"?

cpcallen commented 3 years ago

cache invalidation and naming things...

Ugh yes, so true.

I've used "pseudo function" to refer to both interpreted functions and native functions wrapped in a FUNCTION_PROTO. I'm uncertain what's the correct, terse, way to refer to them.

Yeah, actually upon reflection that is a pretty reasonable thing to do.

The obvious question to ask is: what does the existing codebase do? I actually don't remember.

(Or more accurately: whatever memory I do have is doubtless corrupted by all the refactoring I've done on my derived codebase…)

Webifi commented 3 years ago

The obvious question to ask is: what does the existing codebase do? I actually don't remember.

interpreter.js uses "interpreted function" for purely interpreted functions, "native function" for native functions wrapped in a FUNCTION_PROTO and "native asynchronous function" for native functions wrapped in a FUNCTION_PROTO that will pause the interpreter instance until completion. And that makes perfect sense from inside the interpreter sandbox where there needs to be a distinction between them. But for methods that expose the functions to callbacks from outside the sandbox, I guess it's probably best to just call them all something like "sandboxed function". I've been using "pseudo function", but perhaps that too confusing?

NeilFraser commented 3 years ago

This looks really interesting. I'm currently occupied in a conversion project, but will tackle this as soon as that's complete.

cpcallen commented 3 years ago

I've been using "pseudo function", but perhaps that too confusing?

Absent any clear precedent I think that is absolutely fine.

Webifi commented 3 years ago

@NeilFraser Just a ping to see if you've had a chance to look this over.

Webifi commented 2 years ago

@NeilFraser Looks like I'll need to resolve some conflicts...

Any hope of this, or something like it, being merged? Any do's and don'ts I should keep in mind while refactoring my pull request / resolving conflicts that could make it more likely to be accepted?

toyknight commented 2 years ago

I came up with a walk around for my use case, where I only need to call functions declared in the pseudo code while passing in objects as arguments which can actually control my code. Not sure if it is a desired approach but I will post it here.

Code to create a sandbox that we could invoke functions on:

const createSandbox = (src) => {
  // Create interpreter from user code
  const interpreter = new Interpreter(src);
  // Retrieve global object from interpreter
  const globalScope = interpreter.getGlobalScope().object;
  // Define sandbox state that will be shared between pseudo code and the sandbox
  const state = {
    running: true,
    fInvoked: false,
    fInvokeTarget: "",
    fInvokeArgs: [],
  };
  // Wrap the sand box state (puts it in closure) since 
  // interpreter.nativeToPseudo kinda copies the object values.
  const stateWrapped = {
    getInvokeTarget: () => {
      return sandboxState.fInvokeTarget;
    },
    getInvokeArgs: () => {
      return sandboxState.fInvokeArgs;
    },
    isRunning: () => {
      return sandboxState.running;
    },
    setInvoked: () => {
      state.fInvoked = true;
      state.fInvokeTarget = EMPTY_INVOKE_TARGET;
      state.fInvokeArgs = EMPTY_INVODE_ARGUMENTS;
    }
  };
  // Define function that executes pseudo code, which is private
  const execute = () => {
    let steps = 0;
    // Either the code executes to the end or a function gets invoked, break.
    while (interpreter.step() && !state.fInvoked) {
      steps++;
      if (steps > 1000 /* some arbitrary limit you set */ ) {
        throw Error("Are you trying to infinite loop?");
      }
    }
  }
  // Execute user code first (there's nothing in the global scope yet)
  execute();
  // Append our sandbox code now.
  interpreter.appendCode(`
        // Self invoking function here, not accessible from user code :)
        (function () {
        // Get the sandbox state from the global scope.
            var state = window.state;

            // Now we have the state, delete it from the global scope 
            // so the user code won't be able to get it and cause chaos :)
            delete window.state;

            // A regular loop to invoke functions
            while (state.isRunning()) {
                // Get the function we would like to invoke.
                var f = window[state.getInvokeTarget()];

                // Get the args as well
                var args = state.getInvokeArgs();

                // Call the function if user code defined it.
                if (typeof f === "function") {
                    f.apply(this, args);
                    // Probably could also put the return value into the sandbox state
                    // so the 'invoke' API function could return it.
                }

                // Set fInvoked to true so the execution would stop.
                state.setInvoked();
            }

            // Remove the sandbox state reference when we are done.
            state = undefined;
        })();
  `);
  // Bind the sandbox state that the sandbox code will use.
  interpreter.setProperty(globalScope, "state", interpreter.nativeToPseudo(stateWrapped));
  // Execute sandbox code
  execute();
  // Now we are in the function invoking loop, return the API object.
  return {
    // API to bind objects to pseudo code's global scope
    bind: (name, obj) => {
      interpreter.setProperty(globalScope, name, interpreter.nativeToPseudo(obj));
    },
    // API to invoke a function defined by the pseudo code
    invoke: (functionName, ...args) => {
      state.fInvoked = false;
      state.fInvokeTarget = functionName;
      state.fInvokeArgs = args;
      execute();
    }
    // API to dispose the sandbox
    dispose: () => {
      state.running = false;
      state.fInvokeTarget = "";
      execute();
      interpreter.setProperty(globalScope, "state", undefined);
    }
  }
}

Example of using the sandbox:

// Suppose we would like to let user write code to control a robot
const userCode = `
function onSomethingHappened(robot) {
  robot.doSomething();
  console.log("Function 'onSomethingHappened' called.");
}
`;
// Define robot object
const robot  = {
  doSomething: () => {
    console.log("Robot did something.");
  }
}
// Wrap robot object so it goes into the wrapper object's closure which makes it 
// impossible for the user code to obtain (hide private APIs and states)
const robotWrapper = {
  doSomething: () => {
    robot.doSomething();
  }
}

// Profit.
const sandbox = createSandbox(userCode);
sandbox.bind("console", console);
sandbox.invoke("onSomethingHappened", robotWrapper);
sandbox.dispose();