CakeML / cakeml

CakeML: A Verified Implementation of ML
https://cakeml.org
Other
964 stars 84 forks source link

Implement lexer on mlstrings #353

Open xrchz opened 7 years ago

xrchz commented 7 years ago

The CakeML lexer operates on a char list. Therefore the first thing the CakeML compiler does after reading in its input string is explode the string into a list. This is wasteful: I think the lexer can easily be made to work on mlstrings.

This issue is to make the lexer operate on mlstrings and update the compiler accordingly to avoid the explode.

Extension thoughts:

xrchz commented 5 years ago

Maybe #666 will help?

myreen commented 5 years ago

Currently, my plan with #666 is to do the following w.r.t. the HOL string type in the lexer:

https://github.com/CakeML/cakeml/blob/834e7c4e926d453bea905223f8affad2e460d1ba/compiler/bootstrap/translation/lexerProgScript.sml#L14-L19

Thus this does not avoid the explode on input to the lexer. I agree that the explode is untidy.

Ideally, we would phrase the lexer as a function that is handed to a fold-function over a finite character stream. The stateful implementation of the fold would then read characters lazily and the implementation would avoid producing the giant string containing all of the input. I guess that's what #645 is aiming for.

mn200 commented 5 years ago

You could define a type 'a finiteUnfolder, that was a subset of the type

    'a -> (char # 'a) option

such that LFINITE (LUNFOLD f s) held for all s.

I.e., the wellformedness predicate would be

   wfUF (f : 'a -> (char # 'a) option) <=> !a. LFINITE (LUNFOLD f a)

It'd be easy to show that a "string reader" which used the index as the state was such a thing. The finiteness guarantee in the type would mean that the lexer could be parameterised by an arbitrary unfolder. The question is still how you'd guarantee that an input file being read through TextIO would necessarily be finite. (The completely ad hoc approach of cutting off past a limit of 100 MB would probably be fine: i.e., you could prove something like wfUF (limiter n f) where n is an arbitrary number and limiter is effectively just LTAKE.)