david135 / mosh-scheme

Automatically exported from code.google.com/p/mosh-scheme
Other
0 stars 0 forks source link

expander memory usage runaway (infinite recursion?) #189

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Compile the attached library using the program:

#!r6rs
(import (rnrs)
  (proof))

What is the expected output? What do you see instead?
It should compile.  Memory usage runaway.

What version of the product are you using? On what operating system?
Mosh R6RS scheme interpreter, version 0.2.6 (revision master 2011/02/02 
23:15:32 mosh-0.2.5-407-gda9873d).  i686-pc-linux-gnu.

Please provide any additional information below.
It is not the only problem I have with the expander, but let's start with this.

Original issue reported on code.google.com by mrc....@gmail.com on 3 Feb 2011 at 7:21

Attachments:

GoogleCodeExporter commented 9 years ago
Attached is a smaller library which still causes the runaway.

Original comment by mrc....@gmail.com on 3 Feb 2011 at 9:00

Attachments:

GoogleCodeExporter commented 9 years ago
Your procedure d->sp seems contain unpredicated self recursion.

Did you mean:
     (define (d->sp sym)
      (d->s (string->symbol (string-append "mosh." (symbol->string sym))))) ;; d->sp => d->s

Original comment by oku...@gmail.com on 3 Feb 2011 at 1:14

GoogleCodeExporter commented 9 years ago
your first proof.sls example includes duplicate "pointer-ref-c-pointer" in 
"peekers".

After this and #2 fixed, the problem is "cannot export syntax which defined 
with macro" (right?)

One (not so cool) solution is using lisp-transformer like mosh's (shorten) 
library.
https://github.com/higepon/mosh/blob/master/lib/shorten.ss
When I wrote this code, I thought it's undefined in R6RS that behavior of 
(syntax something) denoted outside of 
pattern variable binding form (i.e. syntax-case, with-syntax, ...)
I should confirm this..

Original comment by oku...@gmail.com on 3 Feb 2011 at 1:34

GoogleCodeExporter commented 9 years ago
So we can do this. By include every definition within a syntax-case and pass 
valid syntax-object to datum->syntax.
AFAIK, this is only way to do this in R6RS portable manner..

Original comment by oku...@gmail.com on 3 Feb 2011 at 2:02

Attachments:

GoogleCodeExporter commented 9 years ago
>Your procedure d->sp seems contain unpredicated self recursion.

Damn, yes.  Sorry.

Original comment by mrc....@gmail.com on 3 Feb 2011 at 7:41

GoogleCodeExporter commented 9 years ago
> your first proof.sls example includes duplicate "pointer-ref-c-pointer" in 
"peekers".

The first file I attached had problems because I removed code from the original 
version to highlight the infinite recursion problem; I just forgot to remove 
everything.  The double inclusion of POINTER-REF-C-POINTER was really a bug in 
my "good" code[1], but it was compiling fine with previous Mosh revisions and 
passing my test suite (that is why I had not found it).

Attached is a modified version of my original library which compiles and passes 
the test suite with current Mosh; I changed D->S to:

    (define (d->s sym)
      (datum->syntax (syntax-case stx () ((?ctx) #'?ctx)) sym))

I do not have a final answer to this problem.

At first, I (wrongly) thought we were dealing with the following paragraph in 
R6RS[2]:

define-syntax form
    The expander expands and evaluates the right-hand-side expression and binds the keyword to the resulting transformer.

given that, correctly, the following program must fail with "unbound 
identifier":

#!r6rs
(import (rnrs))

(define-syntax doit
  (lambda (stx)
    (syntax-case stx ()
      ((?id)
       (begin
     (doit #t)
     #'(display "ciao\n")))
      ((?id ?arg)
       #'(display "ohayo\n"))
      )))

(doit)

I thought we were down to: when the expander finds a DEFINE-SYNTAX form should 
it: (1) evaluate the transformer expression and then push the keyword on the 
environment binding it to the transformer; or should it: (2) push the keyword 
on the environment, leaving it unbound, and then evaluate the transformer and 
bind the result?

Solution (1) is like having a binding in a LET, while solution (2) is like 
having a binding in a LETREC.

But then I see that both Mosh and Vicare work fine with this program in which 
DUMMY is a free identifier:

(begin
  (define a 123)
  (define-syntax doit
    (lambda (stx)
      (datum->syntax #'dummy 'a)))
  (write (doit))
  (newline))

but fail with this program with "unbound identifier":

(begin
  (define-syntax doit
    (lambda (stx)
      (datum->syntax #'dummy '(define a 123))))
  (doit)
  (write a)
  (newline))

and I get the same result if I replace #'DUMMY with #'DOIT;  so it dunno if 
Mosh and Vicare adopt solution (1) or (2).  But it seems clear that in the 
second program the identifier "a" is renamed by the expander to some gensym; 
this is unexpected to me.  So I try this program:

(begin
  (define b)
  (define-syntax doit
    (lambda (stx)
      (datum->syntax #'b '(define a 123))))
  (doit)
  (write a)
  (newline))

and I still get "unbound identifier".

I have to admit that my understanding of renaming is not correct. :-/

[1] 
https://github.com/marcomaggi/nausicaa/raw/devel/scheme/src/libraries/nausicaa/f
fi/peekers-and-pokers/compat.mosh.sls
[2] http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-13.html#node_chap_10

Original comment by mrc....@gmail.com on 4 Feb 2011 at 8:14

Attachments:

GoogleCodeExporter commented 9 years ago
You can do it with:

(import (rnrs))
(begin
  (define b)
  (define-syntax doit
    (lambda (stx)
      (syntax-case stx () ;; add syntax-case
        ((env)
         (datum->syntax #'env '(define a 123))))))
  (doit)
  (write a)
  (newline))

Original comment by oku...@gmail.com on 4 Feb 2011 at 10:58

GoogleCodeExporter commented 9 years ago
> given that, correctly, the following program must fail with "unbound 
identifier":

No. This behavior is undefined in R6RS. Because,

(define-syntax doit
  (lambda (stx)
    (syntax-case stx ()
      ((?id)
       (begin
     (doit #t) ;; <= HERE
     #'(display "ciao\n")))
      ((?id ?arg)
       #'(display "ohayo\n"))
      )))

DOIT used here is expanded at phase 1 and (define-syntax doit ...) will only 
bind DOIT to phase 0.
(nmosh and Larceny will produce some error.)

 --

If you really want to do without use of syntax-case, we will need to extend 
R6RS..

For example, current nmosh can do it with:
(begin
  (define b)
  (define-syntax doit
    (lambda (stx)
      (datum->syntax (car stx) '(define a 123))))
  (doit)
  (write a)
  (newline))

because nmosh supports low-level renaming macro with use of "always unwrapped" 
syntax object.
But psyntax-mosh (or any psyntax-based R6RS) cannot do this.

Original comment by oku...@gmail.com on 4 Feb 2011 at 11:39

GoogleCodeExporter commented 9 years ago
Hmm.. I missed some points..

--

>  But it seems clear that in the second program the identifier "a" is renamed 
by the expander to some gensym;
>  this is unexpected to me.

See following example:

(import (rnrs))                                                                 

(begin

  (define-syntax doit-more
    (syntax-rules () ((_) (define a3 123))))
  (define-syntax doit-more2
    (lambda _ (syntax-case _ () ((_) #'(define a4 123)))))

  (define-syntax doit-again
    (lambda _ #'(define a2 123)))
  (doit-more)
  (doit-more2)
  (doit-again)
  (write (list a2 a3 a4))
  (newline))

DOIT-MORE, DOIT-MORE2 and DOIT-AGAIN will do exactly same thing -- returning 
(syntax (define a 123)) as syntax object.
All of them won't work because "define" itself is also a transformer -- they 
will do some renaming to keep result hygiene.

Thus,
  (define-syntax doit
    (lambda _ #'a))
  (define-syntax doit2
    (syntax-rules () ((_) a)))
will work. Because they don't contain any transformer in their result.

--

Simply, you cannot refer any "local"(i.e. syntactic scope which they are) 
environment without use of syntax-case.
When #'doit is written in outside of syntax-case, they will just mean the 
symbol "doit".
(That's why you'll never get undefined error about "dummy" when you wrote 
#'dummy in datum->syntax.)

Original comment by oku...@gmail.com on 4 Feb 2011 at 7:02

GoogleCodeExporter commented 9 years ago
>> given that, correctly, the following program must fail with "unbound 
identifier":
>
>No. This behavior is undefined in R6RS. Because,
>
>(define-syntax doit
>  (lambda (stx)
>    (syntax-case stx ()
>      ((?id)
>       (begin
>    (doit #t) ;; <= HERE
>    #'(display "ciao\n")))
>      ((?id ?arg)
>       #'(display "ohayo\n"))
>      )))
>DOIT used here is expanded at phase 1 and (define-syntax doit ...) will only
>bind DOIT to phase 0. (nmosh and Larceny will produce some error.)

Yes, this is what I meant.

>DOIT-MORE, DOIT-MORE2 and DOIT-AGAIN will do exactly same thing -- returning
>(syntax (define a 123)) as syntax object. All of them won't work because
>"define" itself is also a transformer -- they will do some renaming to keep
>result hygiene.

We agree on this (I used this idiom multiple times in my code) even though the 
mechanism is not clear to me.

>you cannot refer any "local" [...] environment

This is were our minds part ways and I am confused.  While my mental model of 
the expansion process is still partial, I will try to reassemble it as follows. 
 I assume that the expander internals use definitions equivalent to the 
following:

(define-record-type <syntax-object>
  (nongenerative)
  (fields (immutable S-expression syntax->datum)
          (immutable lexical-context)))

(define (identifier? stx)
  (and (<syntax-object>? stx)
       (symbol? (<syntax-object>-S-expression stx))))

(define (datum->syntax ctx sexp)
  (assert (identifier? ctx))
  (make-<syntax-object> sexp
                        (<syntax-object>-lexical-context ctx)))

where the LEXICAL-CONTEXT field holds a mark value uniquely identifying a 
lexical scope: every time the expander enters a binding form (like LET) it 
generates a new mark value and pushes it on the environment, only the bindings 
introduced by that form will have the generated mark.  A binding is the couple 
mark value/Scheme symbol to which a symbol substitution is associated in the 
environment.

When a SYNTAX form is evaluated by the expander: the returned syntax object has 
the current mark as lexical context, the last that was pushed on the 
environment.

The following program works fine:

(begin
  (define-syntax doit
    (lambda (stx)
      (datum->syntax (syntax-case stx () ((?ctx) #'?ctx))
             '(define a 123))))
  (doit)
  (write a)
  (newline))

My understanding is that DEFINE-SYNTAX causes a value to be pushed on the 
environment with the following properties: it is a keyword (that is: a macro) 
binding, it has the top-mark value, its keyword symbol is DOIT.

When later the transformer processes the (DOIT) form, it first builds a 
<SYNTAX-OBJECT> having the list (DOIT) as sexp and the top-level mark value as 
lexical context; then it builds a <SYNTAX-OBJECT> having the symbol DOIT as 
sexp and the top-mark as lexical context; it searches the environment and finds 
the binding with the same symbol/mark, it recognises it as keyword binding so 
it expands the macro.

The transformer of DOIT is applied to the <SYNTAX-OBJECT> instance having the 
list (DOIT) has sexp and the top-mark value as lexical context; the SYNTAX-CASE 
form evaluates to a <SYNTAX-OBJECT> having the symbol DOIT as sexp and the 
top-mark as lexical context; DATUM->SYNTAX evaluates to a <SYNTAX-OBJECT> 
instance having the list (define a 123) as sexp and the top-mark as lexical 
context.

Such value is returned as output form and expanded: it creates a new binding 
for A at the top level (that is: it pushes on the environment a new 
substitution with A as symbol and the top-mark as context), which later 
captures the A in reference position: this capturing reveals itself only after 
the whole expansion is completed and the same symbol-to-gensym substitution is 
applied to both A in binding position and A in reference position.  Life is 
beautiful.

The following program does not work:

(begin
  (define-syntax doit
    (lambda (stx)
      (datum->syntax #'dummy '(define a 123))))
  (doit)
  (write a)
  (newline))

and you say it is fully equivalent to:

(begin
  (define-syntax doit
    (lambda (stx)
      #'(define a 123)))
  (doit)
  (write a)
  (newline))

I am trying this interpretation: in the first program #'DUMMY evaluates to a 
<SYNTAX-OBJECT> having the symbol DUMMY as sexp and the mark generated by the 
LAMBDA in the macro transformer as lexical context, let's call it M1.  
(datum->syntax #'dummy '(define a 123)) evaluates to a <SYNTAX-OBJECT> having 
the list (define a 123) as sexp and the M1 as lexical context; such syntax 
object is returned as output form.  Later the syntax object is unwrapped and 
all the identifiers in it are marked with M1; DEFINE creates a new binding for 
A/M1 which later does *not* capture A in reference position because the latter 
has the top-mark.  (Why DEFINE/M1 is captured by the DEFINE syntax binding of 
the language?)

In the second program #'(define a 123) evaluates to a <SYNTAX-OBJECT> instance 
having the list (define a 123) as sexp and the mark generated by the LAMBDA in 
the macro transformer, let's call it M1 again.  From now on everything happens 
as in the first program.

Is this correct enough?

Original comment by mrc....@gmail.com on 5 Feb 2011 at 10:46

GoogleCodeExporter commented 9 years ago
> Is this correct enough?

I think so.

> (Why DEFINE/M1 is captured by the DEFINE syntax binding of the language?)

It's undefined in R6RS but many implementation(at least, nmosh, mosh, chez, PLT 
and your vicare)
will treat them as top-level of the program. Top-level might import (rnrs) and 
their friends, so
M1 (silently) includes DEFINE, CONS, ... and their friends.
Ypsilon is the only exception AFAIK -- they will exclude non-exported bindings 
from the context like M1.

(library (vault)
         (export access)
         (import (rnrs))

(define my-secret "my-secret password")

(define-syntax access
  (lambda (x)
    (syntax-case x ()
      ((_ name)
       (datum->syntax #'dummy (syntax->datum #'name))))))
) ;; library (vault)

(import (rnrs) (vault))

(display (access my-secret)) ;; => will show non-exported secret value?
(newline)

prism:check oku$ nmosh test.sps
my-secret password

whoa. nmosh shows unexported value from a library!
But ypsilon...

prism:check oku$ ypsilon --sitelib=. test.sps

error: unbound variable my-secret

backtrace:
  0  (display (access my-secret))
  ..."/Users/oku/check/test.sps" line 3

... will protect them.
I don't know which are better though.

(these example was originally shown by Ypsilon's author.)

Original comment by oku...@gmail.com on 5 Feb 2011 at 11:15

GoogleCodeExporter commented 9 years ago
Thanks!

Original comment by mrc....@gmail.com on 5 Feb 2011 at 12:23

GoogleCodeExporter commented 9 years ago
(Closed.)

Original comment by oku...@gmail.com on 5 Feb 2011 at 1:48