Separate error codes from kernel to user code

hackaugusto commented 5 months ago

The assert error codes, e.g. assert.err=0x1000, are very useful to identify the source of an issue when manually debugging. And we are also planning on using these error codes to perform extra processing ref.

To make this mechanism reliable, we should enforce a separation of error codes, to avoid collision in the errors, specially for kernel code / account code / notes

hackaugusto commented 5 months ago

Some ideas to handle this:

Reserve ranges of error codes for the kernel. Trying to use said error codes in a note would be a compilation error.
- Cons: I think this is a bad approach, since we would need to introduce configuration to the assembler, generally making the compilation process more complicated
Instead of having the code being a single err_code, we can add the originating context. So the error would no longer be just the code but the pair (context, err_code).
Perhaps also add the MAST root to the error, this way one can unambiguously identify the source of an error.
- Cons: This may not work that well for the notes in the transaction script. Since the MAST root is the tx's script and not the notes. But I'm not sure how to fix this.

Fumuran commented 5 months ago

I think the first variant is the best one, since we don't modify MAST and don't increase its size.

we would need to introduce configuration to the assembler

Is it necessary to have a changeable error codes configuration for the assembler? Can we have a fixed encoding pattern for them? Something similar to the approach we used with event_id's. In that case the compilation process won't change, just a one more check. Also I'm not sure that for now we can distinguish a plain procedure and a note, so probably we will need to introduce another flag in the AssemblyContext in addition to the is_kernel.

bobbinth commented 5 months ago

I think enforcing that there are no collisions is going to be difficult (or at least, I'm not seeing a way to do it). The main issue is that the same code can be used in multiple contexts. For example, we have collections::smt code which could be used both by a kernel and by user code (e.g., note, account). So, we can't encode context into the smt source code.

One thing we can do is try define a convention, and hopefully, people will follow it. The convention we've been following so far is:

For error triggered by the VM itself, we set the top 16 bits of the error code to 0, and the remaining 16 bits define a unique error code for VM errors.
For errors triggered by the code in standard library, we set the top 16 bits to 1, and the remaining 16 bits define a unique error code for the stdlib.
For errors triggered by the code in miden-lib, we set the top 16 bits to 2, and the remaining 16 bits define a unique error code for the miden-lib.

hackaugusto commented 5 months ago

@bobbinth in that case, I think the second option would do the trick:

Instead of having the code being a single err_code, we can add the originating context. So the error would no longer be just the code but the pair (context, err_code).

If you don't want to use the tuple, we can also forbid the high bits from being set at the MASM level, and the VM set the high bits depending on the context

bobbinth commented 5 months ago

I think the second option would do the trick:

Instead of having the code being a single err_code, we can add the originating context. So the error would no longer be just the code but the pair (context, err_code).

If you don't want to use the tuple, we can also forbid the high bits from being set at the MASM level, and the VM set the high bits depending on the context

I'm not sure how this would work. Specifically, by the time the VM executes an instruction, all info about where the instruction originated (i.e., which library) is erased. Basically, there isn't enough info to tell what the context of a given instruction should be.

hackaugusto commented 5 months ago

I suppose you assumed I meant providing the information statically? In this case, since the same code can be execute at different contexts, the information would need to be retrieved at runtime from the System struct. Is there are reason why this should not be possible?

bobbinth commented 5 months ago

context in the System struct is not the same context as the source of the instruction. System.context is basically a random value (except for the root context, for which the value is always 0 - but even here, it doesn't help us as library code could be executed in both root context and user contexts).

hackaugusto commented 5 months ago

context in the System struct is not the same context as the source of the instruction. System.context is basically a random value (except for the root context, for which the value is always 0 - but even here, it doesn't help us as library code could be executed in both root context and user contexts).

I don't think I'm following, that is exactly what I wanted. If lib A has an error X at root context we get (0, X), other we get (L, X), and that allows us to differentiate where the error happened

bobbinth commented 5 months ago

Let's take std::collections::smt as an example. We can execute the code contained there in the root context or in user context. We can't force that code in std::collections::smt is always executed in root context.

If we just want to figure out the context in which an error happened - we can already do that by overriding on_assert_failed() method in the Host interface. But this doesn't help us with preventing collisions in error codes.

hackaugusto commented 5 months ago

But this doesn't help us with preventing collisions in error codes.

Yes, we established that already.

If we just want to figure out the context in which an error happened - we can already do that by overriding on_assert_failed() method in the Host interface.

Seems like it is already possible to get the context. Closing the issue.

bobbinth commented 5 months ago

I do think we should document the convention somewhere though. Maybe there are ways to improve on it somehow as well.

0xPolygonMiden / miden-vm

Separate error codes from kernel to user code #1276