Statements vs Expressions

Should there be a distinction between statements and expressions? What should be an expression and what a statement? Is the top level to be composed of statements or expressions?

There is also a difference between separators and terminators. For example it is different whether val or def declarations are terminated by semicolons (the semicolon is part of their grammar rule) or like now, they are just statements separated by semicolons (just like any other statements are serparated by semicolons).

Should initial separator be allowed? E.g. the ML / Haskell style:

def fun
(
, aVeryLongArgumentNameThatOnlyFitsOnItsOwnLine
, aVeryLongArgumentNameThatOnlyFitsOnItsOwnLine2
)
= 1;

What should be the behaviour of these block expressions: {}, { f() }, { f(); }?

It seems to me that the syntax is already Scala like, and having "everything as expression" would maybe be simpler.

I originally wanted to submit a PR to allow initial separators and to somewhat solve the block expression. But I wasn't sure what you intended. My understanding of block expressions is:

{} = evaluates as None
{ f() } = evaluates tof()`
{ f(); } = f() executed but result discarded, the block evaluates to None

This corresponds to the idea that blocks contain expressions separated by semicolons, but trailing separator means implicit None. I don't know whether this is desirable for block expressions or the top level. The idea for implementation is below.

diff --git a/Cacom/src/ast.rs b/Cacom/src/ast.rs
index 5ab4da9..1870251 100644
--- a/Cacom/src/ast.rs
+++ b/Cacom/src/ast.rs
@@ -48,6 +48,12 @@ pub enum AST {
     Expression(Expr),
 }

+impl Default for AST {
+    fn default() -> Self {
+        Self::Expression(Expr::NoneVal)
+    }
+}
+
 /// Expressions always leave some value on the stack
 ///
 /// For example, if statement always leaves resulting value
diff --git a/Cacom/src/grammar.lalrpop b/Cacom/src/grammar.lalrpop
index b486327..46025c4 100644
--- a/Cacom/src/grammar.lalrpop
+++ b/Cacom/src/grammar.lalrpop
@@ -160,9 +160,8 @@ String: String = {

 Block: Expr = {
     CURLYBOPEN <Statements> CURLYBCLOSE => Expr::Block(<>),
-    CURLYBOPEN CURLYBCLOSE => Expr::NoneVal,
 }
-Statements = SeparatedLeastOne<Statement, SEMICOLON>;
+Statements = SeparatedLastMayDefault<Statement, SEMICOLON>;

 FunDecl: AST = {
     DEF <name: Identifier> LPAREN <parameters: Parameters> RPAREN ASSIGN <body: Expr> => {
@@ -184,7 +183,7 @@ Return: AST = {

 // Macros
 Separated<T, S>: Vec<T> = {
-    <mut v: (<T> S)*> <e: T?> => match e {
+    <mut v: S? (<T> S)*> <e: T?> => match e {
         None => v,
         Some(e) => { v.push(e); v }
     }
@@ -193,3 +192,14 @@ Separated<T, S>: Vec<T> = {
 SeparatedLeastOne<T, S>: Vec<T> = {
     <mut v: (<T> S)*> <e: T> S? => { v.push(e); v }
 };
+
+SeparatedLastMayDefault<T, S>: Vec<T> = {
+    <mut v: (<T> S)*> <e: T?> => {
+        let e = match e {
+            Some(e) => e,
+            None => Default::default(),
+        };
+        v.push(e);
+        v
+    }
+};

I consider expression anything that can leave a value of the stack, or it can appear in an assignment on the right side.

So, for example, for, while, var declarations or function declaration aren't expressions. I know they could return none, but honestly I don't like that, it semantically makes no sense to me. It doesn't make sense to me to write it like this

if (...) { val x = 1 }

What should the if return? 1 would probably make sense, but transforming that into expression would let it nest into things like if conditions, or even worse, val x = (val y = (val z 3))). I don't wanna go that route. Same story with assignment.

Yes, currently I think, the semicolon serves as separator. This comes from the fact that I'm lazy and don't want to touch the grammar too much if I can help it :smile:. At first, I though it would be cool if it would be like Rust on the first glance, so basically something like you're saying. But that would possibly create a lot of unpredictable problems. Someone, who may be unfamiliar with this may just write {x;y;} and expecting it returns the last statement, but it returns none and may run into problems later on.

I really don't know about this. I thought about that it would always require expression at the end, so you had to write none explicitly if you want to return in. That is maybe too wordy, but less error prone. What would you prefer? I guess the thing you wrote with trailing semicolon meaning implicit none return makes sense.

I haven't seen much Haskell, so this is kinda unnatural for me :smile:. I'd probably prefer trailing separator.

I kinda feel the language is in a weird spot, and I probably should just go write some programs in it to get the feel for it. But I still want for it to be close to the 'classic' family of languages. So that if you come from C, C++, Go, Java or something like that the syntax will be familiar, and one of the most new things should be the fact that most things are expressions.

I consider expression anything that can leave a value of the stack, or it can appear in an assignment on the right side.

So, for example, for, while, var declarations or function declaration aren't expressions. I know they could return none, but honestly I don't like that, it semantically makes no sense to me. It doesn't make sense to me to write it like this
if (...) { val x = 1 } 
What should the if return? 1 would probably make sense, but transforming that into expression would let it nest into things like if conditions, or even worse, val x = (val y = (val z 3))). I don't wanna go that route.

Sensible.

Same story with assignment.

Even statement oriented languages like C have assignments as expressions:

while (foo = bar()) {
    ...
}

While this is error prone (was this meant to be == instead?) Sometimes it makes the code nicer. Python, where assignments are statements, recently introduced an "assignment expression opreator" :=. That is just FYI, I think its sensible to also have assignments as statements.

Yes, currently I think, the semicolon serves as separator. This comes from the fact that I'm lazy and don't want to touch the grammar too much if I can help it smile. At first, I though it would be cool if it would be like Rust on the first glance, so basically something like you're saying. But that would possibly create a lot of unpredictable problems. Someone, who may be unfamiliar with this may just write {x;y;} and expecting it returns the last statement, but it returns none and may run into problems later on.

I really don't know about this. I thought about that it would always require expression at the end, so you had to write none explicitly if you want to return in. That is maybe too wordy, but less error prone. What would you prefer? I guess the thing you wrote with trailing semicolon meaning implicit none return makes sense.

Crafting interpreters also seems to have a note about expressions vs statements.

There was also recently this discussion also with regards to Rust's blocks.

I view semicolon separated expressions Expr1 ; Expr2 ; Expr3 as evaluate Expr1, discard its value, evaluate Expr2, discard its value, evaluate Expr3. I.e. the last value is not discarded and thus can propagate as the value of the block. If there is a trailing semicolon, then everything is discarded, thus if you want a value where there is none, you get None.

But maybe having the last expression be the value of the block whether there is a trailing separator seems like a very simple rule. That is the current state, so nothing would have to be done. That makes sense easily when everything is an expression. But does it make sense with statements? e.g.:

def f() = { val a = 1 }

In expression oriented languages a block is a sequence of expressions and the block has the value of the last one. Here we have block expressions as sequences of statements and the value of the block is the value of the expression statement if it is the last statement, otherwise None. It could work.

If there is a real distinction between statements and expressions maybe it would make sense to go further and only switch from statement separators to terminators. E.g. after declarations and expressions (like if) a ; is needed, but not after a while or for:

val x = 1;

while false {
    x = x + 1;
}

if true { 1 } else { 2 };

But then it would be weird that if would has to be terminated, but not while. Maybe LALRPOP would accept both if statements and expressions.

Also since expression statements would have to always be terminated by semicolon, blocks would be simple sequences of statements (no separators), but then a trailing semicolon would be required for expression statements:

def id(a) = {
    a; // semicolon required   
}

So explicitly modelling the grammar so that a block is a sequence of statements with an optional trailing expression, like in Rust could make sense. E.g.

struct Block {
    stmts: Vec<Ast>,
    expr: Option<Expr>,
}

I kinda feel the language is in a weird spot, and I probably should just go write some programs in it to get the feel for it. But I still want for it to be close to the 'classic' family of languages. So that if you come from C, C++, Go, Java or something like that the syntax will be familiar, and one of the most new things should be the fact that most things are expressions.

Yes, I am also very unsure, whatever feels best should be chosen.

If there is a real distinction between statements and expressions maybe it would make sense to go further and only switch from statement separators to terminators. E.g. after declarations and expressions (like if) a ; is needed, but not after a while or for...

This is maybe not so bad. It kinda bugs me that you have to separate even function definitions with semicolons. But even block expression shouldn't need that, ie.

{
    ...
}         <-- no semicolon
{
    ...
}

def foo() { ... }  <-- no semicolon
def bar() { ... }

should be legal.

In the ideal world, there would not be any need for semicolons after }. Also, no semicolons after last expression in block, so { stmt1; stmt2; return-value }. The block would always have to end with expression, or be empty (then it's value is none). Or as you said just make the last value (which would have to be missing or be an expression) optional, that is probably even better.

You are way better than me at this grammar stuff, do you think it's possible to do this semicolon magic?

You are way better at grammars than I am, do you think it's possible with the lalrpop to do this magic with semicolons?

Like this?

val x = 1;

a(5, 3);

{
  5;
  1
}

{}

if true { 1 } else { 2 };

def f() = 1;
def g() = {}
def h() = { 1; 2 }
def i() = if true { f() } else { g() };

diff --git a/Cacom/src/ast.rs b/Cacom/src/ast.rs
index 5ab4da9..f4c21e2 100644
--- a/Cacom/src/ast.rs
+++ b/Cacom/src/ast.rs
@@ -67,7 +67,7 @@ pub enum Expr {
     NoneVal,
     String(String),

-    Block(Vec<AST>),
+    Block(Vec<AST>, Box<Expr>),

     List {
         size: Box<Expr>,
@@ -159,10 +159,11 @@ impl Expr {
                     arg.dump(prefix.clone() + " ");
                 }
             }
-            Expr::Block(vals) => {
+            Expr::Block(vals, expr) => {
                 for stmt in vals {
                     stmt.dump(prefix.clone());
                 }
+                expr.dump(prefix);
             }
         }
     }
diff --git a/Cacom/src/compiler.rs b/Cacom/src/compiler.rs
index 4826d33..e3395eb 100644
--- a/Cacom/src/compiler.rs
+++ b/Cacom/src/compiler.rs
@@ -207,7 +207,7 @@ impl Compiler {
                     self.constant_pool.add(Object::from(lit.clone()));
                 self.add_instruction(code, Bytecode::PushLiteral(str_index));
             }
-            Expr::Block(stmts) => {
+            Expr::Block(stmts, _) => {
                 self.enter_scope();
                 self.compile_block(stmts, code)?;
                 self.leave_scope();
diff --git a/Cacom/src/grammar.lalrpop b/Cacom/src/grammar.lalrpop
index b486327..394ede9 100644
--- a/Cacom/src/grammar.lalrpop
+++ b/Cacom/src/grammar.lalrpop
@@ -56,7 +56,7 @@ match {
 }

 pub TopLevel: AST = {
-    TopLevelExpressions => AST::Top(<>),
+    TopLevelExpression+ => AST::Top(<>),
                         => AST::Top(vec![AST::Expression(Expr::NoneVal)]),
 }

@@ -73,19 +73,24 @@ TopLevelExpression: AST = {
 // (ie. in binary operations, conditions and so on...)
 // here belongs for example while or for cycle.
 Statement: AST = {
-    Expr => AST::Expression(<>), // All subexpressions - contains binaryop, ifs, calls and so on.
+    BlockOrTerminatedExpr => AST::Expression(<>), // All subexpressions - contains binaryop, ifs, calls and so on.
     Return => <>,
     VarDecl => <>,
     Assignment => <>,
 }

+BlockOrTerminatedExpr: Expr = {
+    Block => <>,
+    <Expr> SEMICOLON => <>,
+}
+
 Assignment: AST = {
-    <name: Identifier> ASSIGN <value: Expr> => AST::AssignVariable { name, value }
+    <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::AssignVariable { name, value }
 }

 VarDecl: AST = {
-    VAL <name: Identifier> ASSIGN <value: Expr> => AST::Variable { name, mutable: false, value },
-    VAR <name: Identifier> ASSIGN <value: Expr> => AST::Variable { name, mutable: true, value },
+    VAL <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::Variable { name, mutable: false, value },
+    VAR <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::Variable { name, mutable: true, value },
 }

 LeftAssoc<Op, NextLevel>: Expr = {
@@ -135,7 +140,7 @@ Primary: Expr = {
     String => Expr::String(<>),
     Call => <>,
     LPAREN <expr: Expr> RPAREN => expr,
-    Block => <>,
+    //Block => <>, // REDUCE-REDUCE conflict
     Conditional => <>,
     Identifier => Expr::AccessVariable{name: <>},
 }
@@ -159,13 +164,12 @@ String: String = {
 }

 Block: Expr = {
-    CURLYBOPEN <Statements> CURLYBCLOSE => Expr::Block(<>),
+    CURLYBOPEN <v: Statement*> <e: Expr> CURLYBCLOSE => Expr::Block(v, Box::new(e)),
     CURLYBOPEN CURLYBCLOSE => Expr::NoneVal,
 }
-Statements = SeparatedLeastOne<Statement, SEMICOLON>;

 FunDecl: AST = {
-    DEF <name: Identifier> LPAREN <parameters: Parameters> RPAREN ASSIGN <body: Expr> => {
+    DEF <name: Identifier> LPAREN <parameters: Parameters> RPAREN ASSIGN <body: BlockOrTerminatedExpr> => {
         AST::Function{name, parameters, body: body}
     }
 }
@@ -179,7 +183,7 @@ Conditional: Expr = {
 }

 Return: AST = {
-    RETURN <Expr> => AST::Return(<>)
+    RETURN <BlockOrTerminatedExpr> => AST::Return(<>)
 }

 // Macros

Though there is a reduce-reduce conflict in Primary now. I didn't try to resolve it, since I don't even know if this is what you mean.

Yes, this would be great if fixing the conflict (if at all possible) is not too much of a hassle.

From closer inspection it seems to not be without compromise. I am not able to make blocks primary. But at least this makes blocks usable elsewhere. May there is a better way, but I won't investigate further.

val x = 1;

a(5, 3);

{
  5;
  {
    6;
    7
  }
}

{}

{{}}
{}{}

//{} + {}; // Doesn't work "empty statement + empty statement"

//-{} + 1; // Doesn't work unary uses Primary

if { true; false } { 1 } else { 2 };

def f() = 1;
def g() = {}
def h() = { 1; 2 }
def i() = if true { f() } else { g() };

diff --git a/Cacom/src/ast.rs b/Cacom/src/ast.rs
index 5ab4da9..f4c21e2 100644
--- a/Cacom/src/ast.rs
+++ b/Cacom/src/ast.rs
@@ -67,7 +67,7 @@ pub enum Expr {
     NoneVal,
     String(String),

-    Block(Vec<AST>),
+    Block(Vec<AST>, Box<Expr>),

     List {
         size: Box<Expr>,
@@ -159,10 +159,11 @@ impl Expr {
                     arg.dump(prefix.clone() + " ");
                 }
             }
-            Expr::Block(vals) => {
+            Expr::Block(vals, expr) => {
                 for stmt in vals {
                     stmt.dump(prefix.clone());
                 }
+                expr.dump(prefix);
             }
         }
     }
diff --git a/Cacom/src/compiler.rs b/Cacom/src/compiler.rs
index 4826d33..e3395eb 100644
--- a/Cacom/src/compiler.rs
+++ b/Cacom/src/compiler.rs
@@ -207,7 +207,7 @@ impl Compiler {
                     self.constant_pool.add(Object::from(lit.clone()));
                 self.add_instruction(code, Bytecode::PushLiteral(str_index));
             }
-            Expr::Block(stmts) => {
+            Expr::Block(stmts, _) => {
                 self.enter_scope();
                 self.compile_block(stmts, code)?;
                 self.leave_scope();
diff --git a/Cacom/src/grammar.lalrpop b/Cacom/src/grammar.lalrpop
index b486327..47e6056 100644
--- a/Cacom/src/grammar.lalrpop
+++ b/Cacom/src/grammar.lalrpop
@@ -56,7 +56,7 @@ match {
 }

 pub TopLevel: AST = {
-    TopLevelExpressions => AST::Top(<>),
+    TopLevelExpression+ => AST::Top(<>),
                         => AST::Top(vec![AST::Expression(Expr::NoneVal)]),
 }

@@ -73,19 +73,30 @@ TopLevelExpression: AST = {
 // (ie. in binary operations, conditions and so on...)
 // here belongs for example while or for cycle.
 Statement: AST = {
-    Expr => AST::Expression(<>), // All subexpressions - contains binaryop, ifs, calls and so on.
+    BlockOrTerminatedExpr => AST::Expression(<>), // All subexpressions - contains binaryop, ifs, calls and so on.
     Return => <>,
     VarDecl => <>,
     Assignment => <>,
 }

+BlockOrTerminatedExpr: Expr = {
+    Block,
+    <ExprNoBlock> SEMICOLON => <>,
+}
+
+Expr: Expr = {
+    Block,
+    ExprNoBlock,
+}
+
+
 Assignment: AST = {
-    <name: Identifier> ASSIGN <value: Expr> => AST::AssignVariable { name, value }
+    <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::AssignVariable { name, value }
 }

 VarDecl: AST = {
-    VAL <name: Identifier> ASSIGN <value: Expr> => AST::Variable { name, mutable: false, value },
-    VAR <name: Identifier> ASSIGN <value: Expr> => AST::Variable { name, mutable: true, value },
+    VAL <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::Variable { name, mutable: false, value },
+    VAR <name: Identifier> ASSIGN <value: BlockOrTerminatedExpr> => AST::Variable { name, mutable: true, value },
 }

 LeftAssoc<Op, NextLevel>: Expr = {
@@ -98,7 +109,7 @@ Unary<Op, NextLevel>: Expr = {
     NextLevel,
 }

-Expr = LeftAssoc<LogicalOp, AExpr>;
+ExprNoBlock = LeftAssoc<LogicalOp, AExpr>;

 LogicalOp: Opcode = {
     LESS => Opcode::Less,
@@ -135,7 +146,6 @@ Primary: Expr = {
     String => Expr::String(<>),
     Call => <>,
     LPAREN <expr: Expr> RPAREN => expr,
-    Block => <>,
     Conditional => <>,
     Identifier => Expr::AccessVariable{name: <>},
 }
@@ -159,13 +169,12 @@ String: String = {
 }

 Block: Expr = {
-    CURLYBOPEN <Statements> CURLYBCLOSE => Expr::Block(<>),
+    CURLYBOPEN <v: Statement*> <e: Expr> CURLYBCLOSE => Expr::Block(v, Box::new(e)),
     CURLYBOPEN CURLYBCLOSE => Expr::NoneVal,
 }
-Statements = SeparatedLeastOne<Statement, SEMICOLON>;

 FunDecl: AST = {
-    DEF <name: Identifier> LPAREN <parameters: Parameters> RPAREN ASSIGN <body: Expr> => {
+    DEF <name: Identifier> LPAREN <parameters: Parameters> RPAREN ASSIGN <body: BlockOrTerminatedExpr> => {
         AST::Function{name, parameters, body: body}
     }
 }
@@ -179,7 +188,7 @@ Conditional: Expr = {
 }

 Return: AST = {
-    RETURN <Expr> => AST::Return(<>)
+    RETURN <BlockOrTerminatedExpr> => AST::Return(<>)
 }

 // Macros

I'm willing to live with that. Thank you so much!

Will you please include some small tests for this if you publish pull request? Just some basic ones that shows it works. Also probably include the two cases you mentioned it doesn't work but comment them out. Just so that we won't forget this exists.

Gregofi / camel

Statements vs Expressions #27