CakeML / cakeml

CakeML: A Verified Implementation of ML
https://cakeml.org
Other
946 stars 83 forks source link

Define a parser/printer for simple concrete syntax in Polish notation #392

Open xrchz opened 6 years ago

xrchz commented 6 years ago

CakeML currently supports standard s-expression syntax via theories in HOL examples plus our compiler/parsing/fromSexpTheory. It might be nice to also support s-expressions with an even simpler syntax: see https://lists.cakeml.org/pipermail/developers/2017-February/000179.html. This could probably be done without changing the sexp datatype (i.e., fromSexpTheory stays as-is). It would probably make sense for the bulk of this work to end up in HOL examples rather than CakeML. This issue can be closed when the CakeML compiler has an option to support input in the simple syntax.

myreen commented 6 years ago

I just want to note that there is a parser and printer for s-expressions as part of the Jitawa/Milawa work in HOL examples. The s-expression type is:

SExp = Dot SExp SExp | Val num | Sym string

The parser (or at least the machine code implementation of it) is very fast -- much faster (9x faster) than the parser in the Lisp implementations used by ACL2.

The Jitawa parser is the inverse of s-expression printing regardless of what's in the Sym elements, i.e. the printer correctly escapes tricky strings like \ ) " (. ||.

The parser also has support for Milawa-style abbreviations, where the concrete syntax can specify how subexpressions are shared in order to convey that pointer sharing is to be done. Example: the following ((1 . 2) . (1 . 2)) can be represented in memory using only two cons cells; one can make this clear to the parser by writing (#1=(1 . 2) . #1#). Milawa's proof objects, which come from a hash-consed ACL2, make extensive use of these abbreviations to compact s-expressions.

The s-expression datatype, parser and printer are defined here: https://github.com/HOL-Theorem-Prover/HOL/tree/master/examples/theorem-prover/lisp-runtime/parse

I note that simpleSexpScript.sml which is used by compiler/parsing/fromSexpTheory defines an s-expression datatype that is actually a bit more complicated than this one from the Milawa/Jitawa work. Maybe Milawa's s-expression theory should be called the evenSimplerSexpScript.sml.

mn200 commented 6 years ago

The "Tom Sewell"-style notation under the link is Polish notation; I'd hesitate to call it s-expressions. The point of s-expressions is to have something that's feasibly human-readable. Polish notation clearly isn't.

The s-expression stuff in examples/formal-languages adds a separate string constructor so that there are two types of literal: numbers and strings. I guess that makes it 33% more complicated than the Jitawa type :-)

Ramana proved a printing-parsing inversion result for that setup, but restricted the strings allowed to appear in symbols and string literals a little.

xrchz commented 6 years ago

In that case, what I think we want is Polish notation as an input format for CakeML.

mn200 commented 6 years ago

Magnus's idea only requires parse(print s) = s, which we already have for the sexpressions. So why bother with another concrete syntax? (You do also need sexp_to_ast o ast_to_sexp = I, which might be harder.)

xrchz commented 6 years ago

I think the reason is for speed. However, I don't know if the current s-expressions are slow or not.