bkowalik / asci

Another Scheme Console Interpreter
1 stars 0 forks source link

Grammar preparation #2

Closed mikusp closed 10 years ago

mikusp commented 10 years ago

My initial idea was to use use a BNFC. It requires a not-so-simple input file containing grammar that is basically BNF but using some extensions, labels, built-in priorities of operators and what-not. I came with something like this (based on http://www.scheme.com/tspl2d/grammar.html)

<program> ::= <foo>* ;

<foo> ::= <definition> | <expression> ;

<definition> ::= <variable> <definition>
             | <syntax> <definition>
             | "(begin" <definition>* ")"
             | "(let-syntax (" <syntax binding>* ") " <definition>* ")"
             | "(letrec-syntax (" <syntax binding>* ") " <definition>* ")"

<variable> <definition> ::= "(define " <variable> <expression> ")"
                        | "(define (" <variable> <variable>* ")" <body>
                        | "(define (" <variable> <variable>* "." <variable> ")" <body> ;

<variable> ::= <identifier> ;
<body> ::= <definition>* <expression> <expression>* ;

<keyword> ::= <identifier> ;

<syntax binding> ::= "(" <keyword> <transformer expression> ")" ;

<expression> ::= <constant>
           | <variable>
           | "(quote " <datum> ")"
           | "'" <datum>
           | "(lambda " <formals> <body> ")"
           | "(if> " <expression> <expression> <expression> ")"
           | "(if> " <expression> <expression> ")"
           | "(set>! " <variable> <expression> ")"
           | <application>
           | "(let-syntax (" <syntax binding>* ")" <expression> <expression>* ")"
           | "(letrec-syntax (" <syntax binding>* ")" <expression> <expression>* ")" ;

<constant> ::= <boolean>
           | <number>
           | <character>
           | <string> ;

<formals> ::= <variable>
          | "(" <variable>* ")"
          | "(" <variable> <variable>* "." <variable> ")" ;

<application> ::= "(" <expression> <expression>* ")" ;

<identifier> ::= <initial> <subsequent>*
             | "+"
             | "-"
             | "..." ;

<initial> ::= <letter>
          | "!" | "$" | "%" | "&" | "*" | "/" | ":"
          | "<" | "::=/" | ">" | "?" | "~" | "_" | "^" ;

<subsequent> ::= <initial>
             | <digit>
             | "."
             | "+"
             | "-" ;

<letter> ::= "<a>" | "<b>" | "<c>" | "<d>" | "<e>" | "<f>" | "<g>" | "<h>"
         | "<i>" | "<j>" | "<k>" | "<l>" | "<m>" | "<n>" | "<o>" | "<p>"
         | "<q>" | "<r>" | "<s>" | "<t>" | "<u>" | "<w>" | "<x>" | "<y>" | "<z>" ;

<digit> ::= "0" | "1" | "2" | "3" | "4"
        | "5" | "6" | "7" | "8" | "9" ;

I'm afraid that for a dynamically typed language, resulting AST will be too much type-safe to do anything useful easily. Instead, building a custom parser (like in http://hackage.haskell.org/package/husk-scheme-3.17/docs/src/Language-Scheme-Parser.html#lispDef) could be better if we trade more complicated grammar for simplicity. In "Write yourself a Scheme" every type is a "subclass" of LispVal - atoms, numbers, strings, lists of LispVals, lambdas, etc. If we stick to this representation, we would only really have to implement a HUGE eval function that evaluates given LispVal.

tl;dr Should we stick to more complicated, real life grammar or just a bare minimum needed to evaluate given Scheme code?

mikusp commented 10 years ago

Any questions/ideas?

program = { expression } ;

expression = constant
           | variable
           | "'" expression
           | list
           | dotted list ;

constant = boolean
         | number
         | character
         | string ;

boolean = "#t"
        | "#f" ;

number = [ sign ], ( integer | floating ) ;

integer = digit, { digit } ;

sign = "-"
     | "+" ;

digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

floating = digit, { digit }, ".", digit , { digit } ;

character = "#\", any character ;

any character = ? all possible characters ? ;

string = """, { string character }, """ ;

string character = "\""
                 | "\\"
                 | any character - ( """ | "\" ) ;

variable = identifier ;

identifier = ( letter | symbol ), { ( letter | digit | symbol ) } ;

letter = "a" | "b" | ... | "z" | "A" | "B" | ... | "Z" ;

symbol = "!" | "$" | "%" | "&" | "*" | "/" | ":"
       | "<" | "=" | ">" | "?" | "~" | "_" | "^"
       | "." | "+" | "-" ;

list = "(", { expression }, ")" ;

dotted list = "(", expression, { expression }, ".", expression, ")" ;
mikusp commented 10 years ago

Done