gelisam / klister

an implementation of stuck macros
BSD 3-Clause "New" or "Revised" License
130 stars 11 forks source link

The REPL is mostly useless #108

Open david-christiansen opened 4 years ago

david-christiansen commented 4 years ago

Right now, expressions entered in the REPL don't have an appropriate lexical context associated with them. This means that there's no way to associate names at the REPL to actual defined names.

We should fix this. There's a few design possibilities:

  1. The REPL is always in #lang kernel, but it can import and export things from modules. In this model, the REPL is basically an anonymous module. In this case, REPL commands should be declarations. This way is pretty easy to set up in the code - we just need to maintain a set of scopes to enrich each expression with prior to expanding it.

  2. The REPL is started in a particular context. For instance, one could say "give me a REPL for foo.kl line 30 column 20". This would probably have to find the next module-context position to avoid complications with local bindings (or we could implement computation with open terms, but that seems a bit out-of-scope for an ML). To do this, we'd have to instrument the expander to load that module, but watch for the set of scopes to use at that site. Then, we'd stick those scopes on each REPL command, and treat REPL forms as being examples.

  3. We wing it, and invent a language that's almost-but-not-quite Klister with REPL semantics. This is the approach taken in GHC (where ghci allows things that aren't really Haskell expressions, and has a collection of meta-commands as well) and in Racket (where the top level is hopelessly confusing with respect to modules, bindings, and effects if you scratch it a bit).

I like number 2, but REPL-in-context is also one of my hobby horses. How do you all feel about it?

david-christiansen commented 4 years ago

I suppose we could also just delete the REPL...

gelisam commented 1 year ago

All three options have issues :(

(1) would not be very useful, because the kernel language is rarely used except for implementing the prelude.kl language.

One difficulty with (2) is that we now have %#module, so even a position between two top-level definitions would still point in the middle of an arbitrary macro invocation, so it doesn't sound straightforward to find the set of define calls which are considered to precede that position. For example, a custom language could define a %#module which reverses the order of the definitions (so that the code starts with main and ends with the implementation details), or which moves all the datatype definitions to the top of the file, etc.

(3) seems difficult. In ghci, definitions add to the scope, IO expressions are executed, and expressions are printed. This is relatively easy in Haskell, because after parsing the line we know whether it looks like a definition or an expression. But in Klister, we need to decide in advance whether the current Problem is a declaration or an expression, as that affects what macros expand to. Perhaps we need to add (the Problem repl)?

gelisam commented 1 year ago

And probably a corresponding #%repl primitive?

The only macro which solves (the Problem module) is #%module, which automatically gets added around the entire module. A language may define its own myLanguage.#%module which expands to a call to kernel.#%module, thus allowing the language to tweak the overall structure of a module, e.g. by reversing the declaration order.

Similarly, the only macro which solves (the Problem repl) would be #%repl, which would automatically get added around each line the user types at the repl. A language would be able to define its own myLanguage.#%repl which expands to a call to kernel.#%repl, thus allowing the language to tweak what is allowed at the repl, e.g. a language which adds typeclasses could add an (:info MyDataType) command which lists all the instances which mention MyDataType.

gelisam commented 1 year ago

For (2), how about a (enter-repl) macro? klister repl my-module.kl:80:12 would insert an (enter-repl) at that position, or the user could manually add that macro to their own code. The myLanguage.#%module macro would manipulate the (enter-repl) in the same way in manipulates the rest of the code it wraps, until we end up with a kernel.#%module call in which the (enter-repl) call clearly indicates the point at which we want to start the repl-in-context.

When (enter-repl) is expanded in (the Problem declaration), a repl would start as a side-effect. Inside the repl the user would have access to every identifier which is in scope at that point in the program.

When (enter-repl) is expanded in (the Problem (expression type)), it would output a magic (enter-repl-at-runtime) expression which, when executed, would start a repl as a side effect. Inside the repl, the user would have access to every identifier which is in scope at that point in the program. Waiting until the expression is executed means both that we don't need to worry about open terms and that the user will have access to the local variables, similar to the case in which the repl has stopped at a breakpoint in ghci. When (exit-repl expr) is ran in the repl, expr would be spliced back into the program at the place the (enter-repl-at-runtime) expression was executed. The repl could then be entered again at a later time.

doyougnu commented 1 year ago

@gelisam I have some time again to contribute (literally while waiting for paint or plaster to dry :P). Where would be the best place to start on this ticket? Perhaps the enter-repl macro?

I think (2) is the best goal to shoot for out of our options for the reasons stated above

gelisam commented 1 year ago

I have some time again to contribute

Great, me too! I have been making improvements to the examples. I pushed directly to master for those changes, because I consider them to be projects written in Klister which I happen to store in the examples folder because that folder holds all the Klister code which was ever written, not changes to Klister or to its standard library. Let me know if you'd prefer if I submit a PR next time

Perhaps the enter-repl macro?

In a previous meeting, @david-christiansen pointed out a big flaw in my enter-repl proposal: it is only when that macro is expanded that the macro has access to the expansion environment, which contains human-friendly names. The generated (enter-repl-at-runtime) core expression would only get evaluated at runtime, and at that point all the identifiers in the core environment are machine-friendly Uniques.

My next idea is: perhaps (enter-repl) could capture the expansion environment and generate (enter-repl-at-runtime captured-env)? This way, it should be possible to start a repl in which the identifiers in that captured environment are already bound.

The first step towards that, in my opinion, would be to review the part of the commentary which explains the difference between the expansion-time variables and the core variables, and to find the corresponding code in the Klister implementation. Then, as practice, and to make sure that the expansion environment is indeed accessible to primitive macros, I would define a primitive enter-repl macro which will eventually enter the repl, but for now only prints the variables of the expansion environment as a side-effect when it gets expanded, and then returns (unit). For example,

#lang "prelude.kl"

(example
  (let [x 42]
    (enter-repl)))

would print every identifier from the Prelude, plus x. Plus a bunch of identifiers which are no longer in scope but could still theoretically be referenced if a macro managed to construct an identifier with the right scope set.

The next step would be to make that list shorter by filtering it down to the identifiers which are visible from the (enter-repl) call site. This requires getting at least a bit familiar with the ScopeSet system, which is one of the most brain-melting parts of Klister. See the first paragraph of HACKING.rst for more references on that topic. The important part is that the expander has a current level and a current set of scopes, and those can be used to restrict the way-too-long list of identifiers to those which can actually be used from the (enter-repl) call site, namely the still-quite-long list of every identifier from the Prelude, plus x.

Next, in order to take a Syntax from the repl and expand and evaluate it in the captured environment, we need to expand that Syntax with that same current level and current set of scopes. So I guess (enter-repl-at-runtime captured-env level scope-set) also needs to capture those. Next, I would add a core constructor for enter-repl-at-runtime, and I would make (enter-repl) return that instead of returning (unit). Then I would make (enter-repl-at-runtime captured-env level scope-set) print the long list of identifiers and return (unit).

Next, I would try to figure out how to run a nested task processor while in the middle of evaluating a single task. Double-check that the task processor is re-entrant or something. Hardcode a particular expression, e.g. (+ x 1), add the scope set to it, and make sure that

  1. + and x both resolve to the right thing
  2. you can run all the tasks which that expression spawns, and then print the result, without having to yield control back to the surrounding task processor
  3. even for a more complex example whose tasks get stuck until some of the tasks from the surrounding task processor are complete

Then I would run a repl which takes user-written expressions instead of (+ x 1), printing each until the user indicates that they want to quit the repl somehow. Then I would detect the special (exit-repl expr) command, and instead of printing the result, I would exit the repl and return the result of evaluating that expr instead of returning (unit).

david-christiansen commented 1 year ago

Here's a slight modification that might be easier to work with.

What if there were a declaration form in modules called a REPL anchor, inspired by Racket's namespace anchors? Then, the elaboration process saves whatever's needed at that point, and you don't have to worry about someone trying to open a REPL under a binder (which would be super cool, but probably makes more sense for a proof assistant).

The command line could take the file and the exported anchor name. Thoughts?

gelisam commented 1 year ago

Nice! And maybe %#module should create and export one with a default name at the very end, which the repl would load if no anchor name is given at the command-line, for the common case in which the user wants all the identifiers to be visible.

In fact, is it worth supporting anything other than this common use case? If we can't look inside binders, then the one thing which these anchors add over the common case is the ability to only bring half of the top-level definitions in scope, which doesn't seem as useful as bringing all of it in scope.

gelisam commented 1 year ago

I now realize that my proposal for the common case doesn't work, because it would happen too late, when the kernel's %#module finishes executing. This is too late, since we need to capture the environment at expansion time.

david-christiansen commented 1 year ago

It wouldn't be so hard to a macro that expands something like:

(my-module d ...) ==> (#%module d ... (define-repl-anchor DEFAULT) (provide DEFAULT))

There's still a need for the anchor, though, even if not under binders - first off, shadowing can take things out of scope that you might want to experiment with. And, more importantly, it's useful to get a REPL at a particular phase.

Adding support for REPLs under binders wouldn't be THAT hard to do either - basically, the E in REPL becomes a partial evaluator. The hard thing to do is termination, but a conservative approximation should be fine if we're not computing normal forms - all function calls applied to unknown arguments could get residualized, even if the function is known, and I don't think we have other sources of nontermination than functions.

gelisam commented 1 year ago

Adding support for REPLs under binders wouldn't be THAT hard to do either

Using (enter-repl) or (create-repl-anchor)? If the latter, the code which creates the anchor would need to return the anchor to the top-level, so that it can be exported, like this?

(define two-and-anchor
  (let [one 1]
    (let [anchor (create-repl-anchor)]
      (let [two (+ one one)]
        (pair two anchor)))))
(define two
  (fst two-and-anchor))
(define anchor
  (and two-and-anchor))

(export two anchor)
gelisam commented 1 year ago

The hard thing to do is termination

Why do we need to worry about termination? Are you worried about an infinite loop before (enter-repl)? While evaluating an expression at the repl? During type-checking?

david-christiansen commented 1 year ago

I was thinking that REPLS under binders resulted from source locations rather than edits to the source, but reasonable people can disagree about this.

As far as termination goes, it's because the usual techniques for partial evaluation will explore both sides of a conditional. General recursive functions with recursive calls guarded by conditionals become infinite loops unless something is done to break the loop.

doyougnu commented 1 year ago

So it seems like anchors are a good path forward with the caveat that we do not allow them under binding forms (which is a invariant that racket has too). But i'm missing the intuition behind them? From the racket docs it looks like the anchor can only be used to create an empty namespace or create a namespace out of a portion of a module. But then why are anchors needed if namespaces can be created by modules with module->namespace. For our purposes, what does create-repl-anchor get us that a (enter-repl) macro call doesn't if the call can instruct the elaboration process captures whatever is needed at that point (no idea if manipulating the elaboration process like that is possible in a macro though).

gelisam commented 1 year ago

As far as termination goes, it's because the usual techniques for partial evaluation will explore both sides of a conditional. General recursive functions with recursive calls guarded by conditionals become infinite loops unless something is done to break the loop.

Ah! Given the program

{- line 1 -} (define foo 'bar)
{- line 2 -} (define f
{- line 3 -}   (lambda (x)
{- line 4 -}     (let [y 42]
{- line 5 -}       (let [z (if (= x 0) 0 1)]
{- line 6 -}         (enter-repl)))))
{- line 7 -} (example
{- line 8 -}   (f 2))

I was thinking that the repl would bind foo to 'bar, x to 2, y to 42, and z to 1. And if there was no example, the repl would not start at all. And if there was a second example with (f 3), the repl would first open with x bound to 2, and then after exiting the repl with (exit-with 23), the (example (f 2)) would print 23, and then the repl would start again, with x bound to 3 this time. Like a breakpoint.

I think what you have in mind instead is that regardless of the number of examples, the repl always starts exactly once, with foo bound to 'bar, y is bound to 42, and x and z are either unbound or bound to some set of possible values like {Integer} or {0,1}?

doyougnu commented 1 year ago

Ah but f is not generally recursive right? So there would be no infinite loop here during partial evaluation.

david-christiansen commented 1 year ago

The worry here is that someone writes a program like this:

(defun f (x)
  (if (< x 1) 1 (* x (f (- x 1)))))
(defun ff (x)
  (enter-repl!)
  x) 

with defun being a macro for defining recursive multi-arity functions. This is with enter-repl! being an elaboration-time effect, not a run-time effect - as a run-time effect, it's fine. Both are probably useful - I'd been having all these discussions on the assumption that REPLs were elab-time, but it sounds like this was not a shared idea.

Then, at the REPL, they have the following things in scope:

f : (-> Int Int)
ff : (-> Int Int)
x : Int

A session might look like:

> x
x : Int
> f
#<FUNCTION>
> (ff x)
x

So far so good. But what happens when they call (f x)? Doing the beta reductions on the unknown x and residualizing the ifs leads to an infinite term, like (if (< x 1) 1 (* x (if (< (- x 1) 1) 1 (if ......

That's the issue I was gesturing at. One way to solve this is to never reduce function applications when arguments are unknown - not even the function position. There'd be some usability challenges to solve to avoid leaking internal names from macro expansion in the output, but it should be doable. Another solution would be to have a timeout on residualization, or to just stop when something branches on a residualized term. A fancier solution is to do a termination analysis, and only get stuck in the presence of non-size-change-terminating stuff.

Or just ban REPLs in open contexts :-)

david-christiansen commented 1 year ago

FWIW this paper looks like a nice way to get this working: https://arxiv.org/pdf/1808.02101.pdf It's a run-time version of termination analysis. In that case, instead of failing with blame when a program is about to go infinite, it could instead just be quoted/residualized aggressively.

doyougnu commented 1 year ago

So it seems like anchors are a good path forward with the caveat that we do not allow them under binding forms (which is a invariant that racket has too). But i'm missing the intuition behind them? From the racket docs it looks like the anchor can only be used to create an empty namespace or create a namespace out of a portion of a module. But then why are anchors needed if namespaces can be created by modules with module->namespace. For our purposes, what does create-repl-anchor get us that a (enter-repl) macro call doesn't, if the call can instruct the elaboration process captures whatever is needed at that point (no idea if manipulating the elaboration process like that is possible in a macro though).

david-christiansen commented 1 year ago

Notes from the meeting:

It seems that we really want to build a debugger a la Common Lisp. Here's how we do it for expressions:

First, we add a new core form (break! E) with the following typing rule:

Gamma |- e : t
------------------------------
Gamma |- (break! e) : t

The putative small-step operational semantics say (break! e) ---> e. But when the evaluator sees break!, it goes into debug mode which gives a prompt like this:

(define n 5)
(example
  ((lambda (x)
     (break! (+ x 1)) 2))
Debugging from break! at SRCLOC
n : Int := 5
x : Int := 2
> 

At this debug prompt, users can evaluate expressions in the local environment of the break! form. Syntax objects entered at the prompt are decorated with the scope sets of the break! form itself, and the expansion environment at its elaboration is used to resolve identifiers. It could also have some nice features like evaluating expressions at an earlier phase in the same syntactic context, though these wouldn't have local contexts due to that phase having already run. Exiting the debug prompt could be done by providing a value to return:

> :exit 22

or using the one in the program:

> :exit

Putting this directly in the evaluator seems to be a problem - it would tie the whole implementation into knots. It seems better to equip the evaluator with the primitives needed to have some arbitrary debugger use it. In this case, it seems that there are two techniques:

  1. The evaluation monad can be extended with some version of ContT, so that the evaluator for break! can capture the evaluator's continuation (naturally delimited by the invocation of the evaluator), return control to the surrounding system, and then allow it to be reinvoked later.
  2. The evaluator can be rewritten as an abstract machine (e.g. a CEK machine) that represents the evaluator's continuation explicitly, which is a bigger change but would allow the debug shell to also inspect the current continuation (i.e. get a stack trace). Real debuggers are all basically CEK machines - the current program point is C, the locals is E, and the stack is K - so this seems like a good idea.

Presumably, elaboration of break! would save the current expansion environment and the scope sets in a table indexed by source locations, and then the resulting core term could simply save the source location of the break.


We also talked about how it'd be nice to have a debug prompt in the macro monad. Here, mbreak! would be a macro action (so we have mbreak! : (Macro Unit)) that caused a debug task to be pushed on the task queue. Running the debug task then gives a prompt, and exiting the prompt reinvokes the original task where it left off. Here, we can use the same delimited control representation for the Klister macro monad that we already use to do stuck macros, so the implementation shouldn't require major changes.

Commands at the macro debug prompt could be:

Of course, it would also make sense to run macro monad actions from the prompt. pure gives a classic REPL semantics, while the other actions provide a nice way to query things.

Design questions:

This seems to be a question of whether the debugger prompt delimits macro monad control operators. I think that the second (where it indeed is a delimiter) seems to be a more reasonable semantics, because the first can be made a command in the second.

Demo:

> (>>= get-my-type (lambda (t) (type-case t ...)))
type-case at DEBUGGER:1:18 paused because type is not yet solved
Use :later to allow the rest of the system to find the type, or :abort to return to debugger, or :exit T to solve the variable with type T and continue
> > :abort
> 

The macro monad interpreter also has access to the local context, so we can do all the usual stuff at this debug REPL.