Naming convention for assembly instructions

0xPolygonMiden / miden-vm

STARK-based virtual machine

MIT License

631 stars 161 forks source link

Arithmetic instructions

With the work done in various recent PRs (#311, #326, #327) we have refactored u32 operations to follow the naming convention similar to u64 module in stdlib. Specifically, for many instructions we have checked, unchecked, wrapping, and overflowing variants. For example, for 32-bit addition we have the following:

u32checked_add
u32overflowing_add
u32wrapping_add

In the similar spirit, #327 also refactored pow2 operation to consist of 2 variants:

checked_pow2 which fails if the power parameter is greater than 63.
unchecked_pow2 which does not fail for any inputs, but for power parameter greater than 63 produces an undefined result.

We can take this naming convention further: if an operation can fail for some set of inputs, prefix it as checked. For example:

inv -> checked_inv - can fail for divide-by-zero.
not -> checked_not - can fail if the input is not a binary value.
and -> checked_and - can fail if the inputs are not binary values.
or -> checked_or - can fail if the inputs are not binary values.
xor -> checked_xor - can fail if the inputs are not binary values.

Overall, I think the above would be cleaner, and it would be nice if we could always say that checked variants may fail for some inputs, while anything which is not checked will never fail, but for some inputs may produce undefined results. However, there are a couple of wrinkles:

There are some unchecked instructions which could fail on some inputs. For example: u32unchecked_div does not check if the inputs are 32-bit values, but would still fail in case of divide-by-zero`.
Some non-arithmetic instructions could fail for wrong inputs (e.g., cswap, cdrop, mtree.get, mtree.set).

So, the question is: given that we can't make the rule universal, should we still try to apply in cases where we can?

Other instructions

In the above convention, we use underscores to specify different variants of a given instruction and provide parameters for the instruction using dot notation. Should we try to apply it to other instructions as well?

In same cases, the renaming could be fairly trivial:

assert.eq -> assert_eq
mtree.get -> mtree_get
mtree.set -> mtree_set
mtree.cwm -> mtree_cwm

But in other cases, I'm not sure it is worth it. For example should pop.mem become pop_mem?

So, the question is: given that we can't make the rule universal, should we still try to apply in cases where we can?

Just to clarify - are you proposing having both checked and unchecked variants for the instructions you listed (inv, neg, and, etc) or are you proposing adding checked_ as a prefix to the existing instructions without adding any new ones? I probably wouldn't do the latter, since I don't think there's much benefit without it being fully universal, and seeing a checked option usually gives the expectation of an unchecked option. (If you're proposing the former option, I'd want to think about it a bit more - in that case the rule becomes a bit more universal, because unchecked would only fail for divide-by-zero and checked would always guarantee safety).

Either way, I probably wouldn't try to apply the proposed rule to non-arithmetic instructions like cswap, cdrop, etc. It's less familiar, since in the broader context checked/unchecked is usually arithmetic, and it makes the rule less universal as mentioned.

In the above convention, we use underscores to specify different variants of a given instruction and provide parameters for the instruction using dot notation. Should we try to apply it to other instructions as well?

Consistency makes things more predictable & easier to use and maintain, so I think it might make sense to do this. I also have a preference for treating variants and inputs differently (as we are now doing with the u32 ops), since I think it's clearer. If we make a change at all, I think we should do it everywhere, including the io ops.

I took a look at the io ops to see how much work this would require. The biggest change needed would be in the validate_operation macro, and it should actually result in a simpler macro and simpler validation requirements (I think we could remove the top level validation), so it would probably even be an improvement in isolation. Due to the structure of the i/o instruction parsing, I don't think much is needed beyond this.

0xPolygonMiden / miden-vm

Naming convention for assembly instructions #329

Arithmetic instructions

Other instructions