Open xrchz opened 6 years ago
I just want to note that there is a parser and printer for s-expressions as part of the Jitawa/Milawa work in HOL examples. The s-expression type is:
SExp = Dot SExp SExp | Val num | Sym string
The parser (or at least the machine code implementation of it) is very fast -- much faster (9x faster) than the parser in the Lisp implementations used by ACL2.
The Jitawa parser is the inverse of s-expression printing regardless of what's in the Sym
elements, i.e. the printer correctly escapes tricky strings like \ ) " (. ||
.
The parser also has support for Milawa-style abbreviations, where the concrete syntax can specify how subexpressions are shared in order to convey that pointer sharing is to be done. Example: the following ((1 . 2) . (1 . 2))
can be represented in memory using only two cons cells; one can make this clear to the parser by writing (#1=(1 . 2) . #1#)
. Milawa's proof objects, which come from a hash-consed ACL2, make extensive use of these abbreviations to compact s-expressions.
The s-expression datatype, parser and printer are defined here: https://github.com/HOL-Theorem-Prover/HOL/tree/master/examples/theorem-prover/lisp-runtime/parse
I note that simpleSexpScript.sml
which is used by compiler/parsing/fromSexpTheory
defines an s-expression datatype that is actually a bit more complicated than this one from the Milawa/Jitawa work. Maybe Milawa's s-expression theory should be called the evenSimplerSexpScript.sml
.
The "Tom Sewell"-style notation under the link is Polish notation; I'd hesitate to call it s-expressions. The point of s-expressions is to have something that's feasibly human-readable. Polish notation clearly isn't.
The s-expression stuff in examples/formal-languages
adds a separate string constructor so that there are two types of literal: numbers and strings. I guess that makes it 33% more complicated than the Jitawa type :-)
Ramana proved a printing-parsing inversion result for that setup, but restricted the strings allowed to appear in symbols and string literals a little.
In that case, what I think we want is Polish notation as an input format for CakeML.
Magnus's idea only requires parse(print s) = s
, which we already have for the sexpressions. So why bother with another concrete syntax? (You do also need sexp_to_ast o ast_to_sexp = I
, which might be harder.)
I think the reason is for speed. However, I don't know if the current s-expressions are slow or not.
CakeML currently supports standard s-expression syntax via theories in HOL examples plus our
compiler/parsing/fromSexpTheory
. It might be nice to also support s-expressions with an even simpler syntax: see https://lists.cakeml.org/pipermail/developers/2017-February/000179.html. This could probably be done without changing the sexp datatype (i.e.,fromSexpTheory
stays as-is). It would probably make sense for the bulk of this work to end up in HOL examples rather than CakeML. This issue can be closed when the CakeML compiler has an option to support input in the simple syntax.