Closed matthiasgoergens closed 1 year ago
Here's one way to add noop
support on top of branch next
:
diff --git a/assembly/src/assembler/instruction/mod.rs b/assembly/src/assembler/instruction/mod.rs
index 57e59f0..b9a53d7 100644
--- a/assembly/src/assembler/instruction/mod.rs
+++ b/assembly/src/assembler/instruction/mod.rs
@@ -275,6 +275,8 @@ impl Assembler {
Instruction::CallLocal(idx) => self.call_local(*idx, ctx),
Instruction::CallImported(id) => self.call_imported(id, ctx),
Instruction::SysCall(id) => self.syscall(id, ctx),
+
+ Instruction::Noop => span.add_op(Noop),
};
// compute and update the cycle count of the instruction which just finished executing
diff --git a/assembly/src/parsers/context.rs b/assembly/src/parsers/context.rs
index 18befa2..d85d1e8 100644
--- a/assembly/src/parsers/context.rs
+++ b/assembly/src/parsers/context.rs
@@ -509,6 +509,8 @@ fn parse_op_token(op: &Token) -> Result<Node, ParsingError> {
"mtree_set" => simple_instruction(op, MTreeSet),
"mtree_cwm" => simple_instruction(op, MTreeCwm),
+ "noop" => simple_instruction(op, Noop),
+
// ----- catch all ------------------------------------------------------------------------
_ => Err(ParsingError::invalid_op(op)),
}
diff --git a/assembly/src/parsers/nodes.rs b/assembly/src/parsers/nodes.rs
index 2e77900..82ab5fe 100644
--- a/assembly/src/parsers/nodes.rs
+++ b/assembly/src/parsers/nodes.rs
@@ -252,6 +252,8 @@ pub enum Instruction {
CallLocal(u16),
CallImported(ProcedureId),
SysCall(ProcedureId),
+
+ Noop,
}
impl fmt::Display for Instruction {
@@ -499,6 +501,7 @@ impl fmt::Display for Instruction {
Self::CallLocal(index) => write!(f, "call.{index}"),
Self::CallImported(proc_id) => write!(f, "call.{proc_id}"),
Self::SysCall(proc_id) => write!(f, "syscall.{proc_id}"),
+ Self::Noop => write!(f, "noop"),
}
}
}
diff --git a/assembly/src/parsers/serde/deserialization.rs b/assembly/src/parsers/serde/deserialization.rs
index 681e4ef..ace395a 100644
--- a/assembly/src/parsers/serde/deserialization.rs
+++ b/assembly/src/parsers/serde/deserialization.rs
@@ -424,6 +424,8 @@ impl Deserializable for Instruction {
OpCode::CallLocal => Ok(Instruction::CallLocal(bytes.read_u16()?)),
OpCode::CallImported => Ok(Instruction::CallImported(bytes.read_procedure_id()?)),
OpCode::SysCall => Ok(Instruction::SysCall(bytes.read_procedure_id()?)),
+
+ OpCode::Noop => Ok(Instruction::Noop),
}
}
}
diff --git a/assembly/src/parsers/serde/mod.rs b/assembly/src/parsers/serde/mod.rs
index 36cba10..2d9af6f 100644
--- a/assembly/src/parsers/serde/mod.rs
+++ b/assembly/src/parsers/serde/mod.rs
@@ -250,4 +250,6 @@ pub enum OpCode {
CallLocal = 216,
CallImported = 217,
SysCall = 218,
+
+ Noop = 219,
}
diff --git a/assembly/src/parsers/serde/serialization.rs b/assembly/src/parsers/serde/serialization.rs
index 4cc07aa..99fe7c1 100644
--- a/assembly/src/parsers/serde/serialization.rs
+++ b/assembly/src/parsers/serde/serialization.rs
@@ -518,6 +518,8 @@ impl Serializable for Instruction {
target.write_opcode(OpCode::SysCall);
target.write_procedure_id(imported);
}
+
+ Self::Noop => target.write_opcode(OpCode::Noop),
}
}
}
The main reason is that processing MAST with empty blocks becomes a bit more complicated. The solution we have on MAST side is that we add NOOP
operations if needed.
We could do this on the Assembly side as well but introducing noop
instruction as you've suggested. We could also try to eliminate empty blocks while parsing source code to AST.
But I'd like to understand the use case a bit better. Are there any specific blocks which can sometimes be generated as empty (e.g., empty while
loops or empty if
statements`)?
For our specific use case, we are generating code. We have different optimization steps in that generation process that we can turn on and off independently, to test and debug them.
For some specific combinations of flags, we sometimes produces functions with empty bodies. To deal with that, we use a relatively clunky post-processing that removes those functions and, here's the ugly part, goes through all the rest of our generated code to remove all calls to these functions.
It would be easier, if we could just leave them in. As there's no good reason why empty functions shouldn't work?
As I like to say, zero is also a number. A perfectly cromulent one, too.
As a workaround, I am suggesting exposing the noop
instruction, because then the post-processing we need to do at least doesn't involve all the call-sites, but just a simple noop
insertion into the body of otherwise empty functions.
Does this make sense?
@matthiasgoergens - sorry for a delayed reply and thank you for submitting #609! Before proceeding with the PR, I do have a few questions, which I think we should resolve:
First, what do we want to do here? Add noop
, support empty blocks, or both? It seems to me that just supporting empty blocks (of the form being end
or if.true end
) should suffice. If this is supported, is there any reason to also expose noop
operation?
Second, for empty blocks, I think all we need to do is to support empty SPAN
blocks. As far as the VM is concerned, an empty SPAN
block can be executed as any number of NOOP
s between 1 and 72. I think it makes sense to always execute it as a single NOOP
.
Alternatively, we could try to have the assembler optimize away empty SPAN
blocks. This may be a good idea, but probably requires a bit more thinking and could be done in subsequent PRs.
@bobbinth I changed the PR to only add support for empty-blocks (and ignore noops).
I worked on exposing noops first, because that was easier. In principle, I'd still be in favour of exposing them, because it's easy and exposes more of the MAST without causing any harm. But as you point out, it's not really necessary.
Yes, optimization of various construct, including empty blocks, would be an interesting future idea. But that's a bit too complicated for one quick PR.
Closed by #609
Is there any good reason that code blocks can't be empty?
It would make code generation easier, if empty blocks were allowed.
Alternatively (or additionally), could we make the
noop
instruction available to people using the assembler?