Support for empty blocks

matthiasgoergens commented 1 year ago

Is there any good reason that code blocks can't be empty?

It would make code generation easier, if empty blocks were allowed.

Alternatively (or additionally), could we make the noop instruction available to people using the assembler?

matthiasgoergens commented 1 year ago

Here's one way to add noop support on top of branch next:

diff --git a/assembly/src/assembler/instruction/mod.rs b/assembly/src/assembler/instruction/mod.rs
index 57e59f0..b9a53d7 100644
--- a/assembly/src/assembler/instruction/mod.rs
+++ b/assembly/src/assembler/instruction/mod.rs
@@ -275,6 +275,8 @@ impl Assembler {
             Instruction::CallLocal(idx) => self.call_local(*idx, ctx),
             Instruction::CallImported(id) => self.call_imported(id, ctx),
             Instruction::SysCall(id) => self.syscall(id, ctx),
+
+            Instruction::Noop => span.add_op(Noop),
         };

         // compute and update the cycle count of the instruction which just finished executing
diff --git a/assembly/src/parsers/context.rs b/assembly/src/parsers/context.rs
index 18befa2..d85d1e8 100644
--- a/assembly/src/parsers/context.rs
+++ b/assembly/src/parsers/context.rs
@@ -509,6 +509,8 @@ fn parse_op_token(op: &Token) -> Result<Node, ParsingError> {
         "mtree_set" => simple_instruction(op, MTreeSet),
         "mtree_cwm" => simple_instruction(op, MTreeCwm),

+        "noop" => simple_instruction(op, Noop),
+
         // ----- catch all ------------------------------------------------------------------------
         _ => Err(ParsingError::invalid_op(op)),
     }
diff --git a/assembly/src/parsers/nodes.rs b/assembly/src/parsers/nodes.rs
index 2e77900..82ab5fe 100644
--- a/assembly/src/parsers/nodes.rs
+++ b/assembly/src/parsers/nodes.rs
@@ -252,6 +252,8 @@ pub enum Instruction {
     CallLocal(u16),
     CallImported(ProcedureId),
     SysCall(ProcedureId),
+
+    Noop,
 }

 impl fmt::Display for Instruction {
@@ -499,6 +501,7 @@ impl fmt::Display for Instruction {
             Self::CallLocal(index) => write!(f, "call.{index}"),
             Self::CallImported(proc_id) => write!(f, "call.{proc_id}"),
             Self::SysCall(proc_id) => write!(f, "syscall.{proc_id}"),
+            Self::Noop => write!(f, "noop"),
         }
     }
 }
diff --git a/assembly/src/parsers/serde/deserialization.rs b/assembly/src/parsers/serde/deserialization.rs
index 681e4ef..ace395a 100644
--- a/assembly/src/parsers/serde/deserialization.rs
+++ b/assembly/src/parsers/serde/deserialization.rs
@@ -424,6 +424,8 @@ impl Deserializable for Instruction {
             OpCode::CallLocal => Ok(Instruction::CallLocal(bytes.read_u16()?)),
             OpCode::CallImported => Ok(Instruction::CallImported(bytes.read_procedure_id()?)),
             OpCode::SysCall => Ok(Instruction::SysCall(bytes.read_procedure_id()?)),
+
+            OpCode::Noop => Ok(Instruction::Noop),
         }
     }
 }
diff --git a/assembly/src/parsers/serde/mod.rs b/assembly/src/parsers/serde/mod.rs
index 36cba10..2d9af6f 100644
--- a/assembly/src/parsers/serde/mod.rs
+++ b/assembly/src/parsers/serde/mod.rs
@@ -250,4 +250,6 @@ pub enum OpCode {
     CallLocal = 216,
     CallImported = 217,
     SysCall = 218,
+
+    Noop = 219,
 }
diff --git a/assembly/src/parsers/serde/serialization.rs b/assembly/src/parsers/serde/serialization.rs
index 4cc07aa..99fe7c1 100644
--- a/assembly/src/parsers/serde/serialization.rs
+++ b/assembly/src/parsers/serde/serialization.rs
@@ -518,6 +518,8 @@ impl Serializable for Instruction {
                 target.write_opcode(OpCode::SysCall);
                 target.write_procedure_id(imported);
             }
+
+            Self::Noop => target.write_opcode(OpCode::Noop),
         }
     }
 }

bobbinth commented 1 year ago

The main reason is that processing MAST with empty blocks becomes a bit more complicated. The solution we have on MAST side is that we add NOOP operations if needed.

We could do this on the Assembly side as well but introducing noop instruction as you've suggested. We could also try to eliminate empty blocks while parsing source code to AST.

But I'd like to understand the use case a bit better. Are there any specific blocks which can sometimes be generated as empty (e.g., empty while loops or empty if statements`)?

matthiasgoergens commented 1 year ago

For our specific use case, we are generating code. We have different optimization steps in that generation process that we can turn on and off independently, to test and debug them.

For some specific combinations of flags, we sometimes produces functions with empty bodies. To deal with that, we use a relatively clunky post-processing that removes those functions and, here's the ugly part, goes through all the rest of our generated code to remove all calls to these functions.

It would be easier, if we could just leave them in. As there's no good reason why empty functions shouldn't work?

As I like to say, zero is also a number. A perfectly cromulent one, too.

As a workaround, I am suggesting exposing the noop instruction, because then the post-processing we need to do at least doesn't involve all the call-sites, but just a simple noop insertion into the body of otherwise empty functions.

Does this make sense?

bobbinth commented 1 year ago

@matthiasgoergens - sorry for a delayed reply and thank you for submitting #609! Before proceeding with the PR, I do have a few questions, which I think we should resolve:

First, what do we want to do here? Add noop, support empty blocks, or both? It seems to me that just supporting empty blocks (of the form being end or if.true end) should suffice. If this is supported, is there any reason to also expose noop operation?

Second, for empty blocks, I think all we need to do is to support empty SPAN blocks. As far as the VM is concerned, an empty SPAN block can be executed as any number of NOOPs between 1 and 72. I think it makes sense to always execute it as a single NOOP.

Alternatively, we could try to have the assembler optimize away empty SPAN blocks. This may be a good idea, but probably requires a bit more thinking and could be done in subsequent PRs.

matthiasgoergens commented 1 year ago

@bobbinth I changed the PR to only add support for empty-blocks (and ignore noops).

I worked on exposing noops first, because that was easier. In principle, I'd still be in favour of exposing them, because it's easy and exposes more of the MAST without causing any harm. But as you point out, it's not really necessary.

Yes, optimization of various construct, including empty blocks, would be an interesting future idea. But that's a bit too complicated for one quick PR.

bobbinth commented 1 year ago

Closed by #609

0xPolygonMiden / miden-vm

Support for empty blocks #585