dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.89k stars 258 forks source link

Improvement suggestions for the Dafny AST (AST.dfy) #5690

Open dnezam opened 1 month ago

dnezam commented 1 month ago

Summary

Various suggestions to change the Dafny AST to make it, and using it for compilation easier.

Background and Motivation

I am currently working on a project to compile Dafny to CakeML, a language with a verified compiler, using HOL. Currently, the Dafny AST is exported to HOL via S-expressions, which can then be used to implement the transformation. As a long term goal, it would be nice to also verify this translation step, for now however, the goal is to support as many language features as possible.

In the process of doing so, I came across a few points in regards to what the AST contains (or does not), and thought it would be useful to discuss this to either clear out misunderstandings on my side (which is not unlikely), or point to possible improvements to the AST that could make future compiler work easier.

Proposed Feature

Here is a list of things that I came across. Please let me know in case I should create separate issues.

Alternatives

Most of these things (if not all) can be worked around in some way or another, but usually this introduces "hacky" and/or boilerplate code.

fabiomadge commented 1 month ago

@MikaelMayer

MikaelMayer commented 1 month ago

Hi @dnezam and thanks for thinking of all of this ! I think we should stop advertising AST as the "Dafny AST", but rather as an intermediate representation language suitable for code generation in other languages, but Dafny itself is no longer the preferred target. That said, what you suggest is sometimes relevant.

Remove Method.hasBody, and use an option for the body instead That sounds ok to me.

Instead of Method.wasFunction, split Function into a separate node I don't agree this is a good idea because in code generation, we only emit methods, no longer functions. Code generators only deal with methods. The information that it used to be a function is only relevant in Rust for the self parameter, whether it's &mut self or it can be just &self.

Remove Return e from my observations, it seems that this is only used in functions, while methods use EarlyReturn instead EarlyReturn is used for methods when there is a "return" without argument, whereas Return is used when the argument is known (functions or 1-outvar methods). If that makes it easier for you, we could consider merging the two and put an option for the return value.

Remove String from primitive Using strings in primitives makes it possible to emit a DafnyString instead of just a cumbersome Sequence<char> in the resulting code. So I don't have plans to remove it for now if it makes the generated code more readable.

Drop type from IntLiteral and the like The type was added by @ssomayyajula, maybe he has some insights.

Drop ModuleItem.Module To my understanding, nested modules should not be possible, as they seem to be linearized (and topologically sorted) based on for example this code and looking at very simple examples I agree, there is no nested modules in the linearized Dafny. I do reproduce nesting structure in the Rust AST but that's different.

Add return type to Expression.Call I think this is doable. https://github.com/dafny-lang/dafny/pull/5613 might already have done it in the signature however. Please check when merged.

Create a new node Program, which contains a list of modules, and the location of main That's ok provided everything else works correctly.

Keep pattern matching in the AST Only if the desugared version of the pattern matching still exist, that way we can decide in the code generator which one to use.

Drop separate Tuple datatype in "_System" module Rust supports tuple until size 12 (at least for all common traits), whereas Dafny supports them until size 20, so we still need custom tuples for different tuple sizes. Again, it's coming up in an upcoming PR.

Since Dafny depends on the contributions of everyone, I would be happy to serve as a reviewer of any PR that implement a point that I approved above. But please wait for https://github.com/dafny-lang/dafny/pull/5613 to be merged first as it perform significant changes in the code generation and the AST.dfy.

dnezam commented 1 month ago

Hey @MikaelMayer

Thanks for being open and taking the time to respond to my coments!

I think we should stop advertising AST as the "Dafny AST", but rather as an intermediate representation language suitable for code generation in other languages, but Dafny itself is no longer the preferred target.

I think I heard it being referred to as "reduced AST", but since I wasn't sure whether that was correct terminology, I tried to play it safe and went with the filename.

Instead of Method.wasFunction, split Function into a separate node I don't agree this is a good idea because in code generation, we only emit methods, no longer functions. Code generators only deal with methods. The information that it used to be a function is only relevant in Rust for the self parameter, whether it's &mut self or it can be just &self.

~Assuming with code generators you mean something like the Rust generator, I don't quite see why they would only deal with methods. In CakeML, knowing we are dealing with a function makes compilation easier (and probably results in more performant code), as it can be compiled down to a normal function without needing to add a result reference + exception handling. From what you are saying, it also seems that there is benefit of knowing for the Rust generator. Additionally, DafnyCodeGenerator.cs already has that information, and I am not sure whether I see the upsides of dropping it.~

Remove Return e from my observations, it seems that this is only used in functions, while methods use EarlyReturn instead EarlyReturn is used for methods when there is a "return" without argument, whereas Return is used when the argument is known (functions or 1-outvar methods). If that makes it easier for you, we could consider merging the two and put an option for the return value.

Is Return being used for methods as well a newer thing? I get the following program/reduced AST combination (I dropped the system module for brevity):

method foo() returns (r: int) {
    return 1;
}

method Main() {
    var foo_r := foo();
}
Module (Name "_module") []
       (SOME
          [ModuleItem_Class
             (Class (Name "__default") (Ident (Name "_module")) [] [] []
                [ClassItem_Method
                   (Method T T NONE (Name "foo") [] []
                      [DeclareVar (Name "r") (Primitive Int)
                         (SOME (InitializationValue (Primitive Int)));
                       Assign (AssignLhs_Ident (Ident (Name "r")))
                         (Literal (IntLiteral "1" (Primitive Int)));
                       EarlyReturn; EarlyReturn] [Primitive Int]
                      (SOME [Ident (Name "r")]));
                 ClassItem_Method
                   (Method T T NONE (Name "Main") [] []
                      [DeclareVar (Name "_1_foo__r") (Primitive Int) NONE;
                       DeclareVar (Name "_out0") (Primitive Int) NONE;
                       Call
                         (Companion
                            [Ident (Name "_module");
                             Ident (Name "__default")])
                         (CallName (Name "foo") NONE (CallSignature [])) []
                         [] (SOME [Ident (Name "_out0")]);
                       Assign (AssignLhs_Ident (Ident (Name "_1_foo__r")))
                         (Expression_Ident (Name "_out0")); EarlyReturn] []
                      (SOME []))] [])])

Note that this is the state of roughly the beginning of July - I haven't had time to update it. As can be seen, even in the 1-outvar case EarlyReturn is used. Personally, I find this quite elegant as the implementation generalizes nicely to multiple outvars. ~So in case this hasn't changed, then I think the combination of having a separate function node which contains an expression as body (making Return obsolete), would be a strict improvement for any language that can make use of the pureness notion of a function.~

EDIT: After talking to @fabiomadge I think the issue with these two proposals is that expressions are possibly compiled down to statements in the SinglePassCodeGenerator. Nonetheless, the method node feels a bit rough, since functions/1-var methods (?)/multiple var methods are compiled down to the same node, and which would need different handling on CakeML's side. It would be good if the variable denoting that it was encoded from a function would instead give the assumptions one can make about it (i.e. what assumptions does one get from knowing the method came from a function, s.t. you could for example use &self instead of &mut self). Additionally, it would be nice if functions and 1-outvar methods are treated the same as multiple out-var methods, i.e. functions are "converted" into 1-outvar methods, and 1-outvar methods are encoded as >1-outvar methods (similar to the code above).

Keep pattern matching in the AST Only if the desugared version of the pattern matching still exist, that way we can decide in the code generator which one to use.

That sounds good to me! I think both have their uses, as one can access datatype fields with destructors, which probably map nicely to the constructs there are now.

Drop separate Tuple datatype in "_System" module Rust supports tuple until size 12 (at least for all common traits), whereas Dafny supports them until size 20, so we still need custom tuples for different tuple sizes. Again, it's coming up in an upcoming PR.

Wouldn't the Tuples be encoded using Expression.Tuple or Type.Tuple? At that point, you could make it the responsibility of the implementation to somehow encode them (this is the way I am currently implementing) instead of giving them an implementation as a datatype. This would make using native types easier.

But please wait for https://github.com/dafny-lang/dafny/pull/5613 to be merged first as it perform significant changes in the code generation and the AST.dfy.

No problem!