cosmos / ibc-rs

Rust implementation of the Inter-Blockchain Communication (IBC) protocol.
Apache License 2.0
197 stars 79 forks source link

Critical review of error handling across ibc-rs #1310

Open Farhad-Shabani opened 1 month ago

Farhad-Shabani commented 1 month ago

Background

Over the past year of developing ibc-rs, we've introduced various features and improvements, which often required us to adjust our error enums. However, we haven’t yet taken the time to thoroughly assess the soundness of these errors and enhance them all across the board. Generally, I think we should strive to achieve the following criteria (by priority):

  1. Produce meaningful, structured error variants that allow hosts to easily interpret and act on propagated errors.
  2. Make debugging straightforward and convenient for developers.
  3. Preferably avoid forcing users to adopt yet another third-party error-handling library

Critical Review

Our current error-handling system has the following flaws:

1. Lack of clear minimal specification

Firstly, we don't have a minimal specification that clearly outlines how errors should be defined for an ibc-rs package. This becomes an issue when someone wants to develop a new module using ibc-rs. We've often ended up with packages that have inconsistent variant naming, unclear guidelines on which fields should or shouldn't be included in variants, and irregular From/TryFrom conversions across the entire repository.

2. Host and protocol errors are not differentiated

Differentiate between errors originating from hosts and those from the ibc-rs implementation, also termed as protocol errors. Our Validation/ExecutionContext should not force users to work exclusively with ContextError, which is a top-level protocol error enum. Instead, hosts should be able to introduce their own custom errors that are consistent with their systems. The ibc-rs entrypoints should then be capable of handling both these host-specific and protocol errors together when returning an error.

This also applies to application contexts. We shouldn't require hosts to handle TokenTransferError when integrating token (nft) transfer contexts. Instead, keep TokenTransferError as an IBC-specific error, allowing hosts to introduce their own error types.

3. Excessive nested enums

There are several layers of nested enums that makes it difficult to follow. For instance, at the client level, like in 07-tendermint, when an error occurs, it gets wrapped in the 02-client error, which is then part of ContextError. Ideally, it should be enough to simply point out where the error occurred, such as indicating that it was caused by 07-tendermint. At the very least, we should explore ways to reduce the nesting by flattening the hierarchy by one level.

4. Inefficient String representation in error variants

Application and client-specific errors (like those in 07-tendermint) often get converted to strings and end up in variants like Other or ClientSpecific within lower-level error enums. This approach not only leads to large memory allocations but also makes it difficult for hosts to traverse the error structure to extract or check the specific data they need. Instead, they are forced to parse these strings, which is cumbersome and inefficient.

5. Misleading error when converting Any message

When converting an incoming IBC message of Any type to a MsgEnvelope, we're currently mapping decoding errors to a variant of RouterError. This mapping is misleading, as RouterError is not the appropriate category.

6. Over-specification with too many variants

Sometimes, we encounter an unnecessary breakdown of errors. Many of these errors could be merged into more general-purpose, meaningful variants that would serve multiple situations. For example, when it comes to client status errors, we already have several overlapping variants that could be combined into a single, more efficient one:

    pub enum ClientError {
        /// client is frozen with description: `{description}`
        ClientFrozen { description: String },
        /// client is not active. Status=`{status}`
        ClientNotActive { status: Status },
        /// client is not frozen or expired. Status=`{status}`
        ClientNotInactive { status: Status },
        ... other variants
       }

Can be merged into a single variant as follows:

  pub enum ClientError {
      /// invalid client status with actual: `{actual}`, expected: `{expected}`
      InvalidStatus { actual: Status, expected: Status },
      ... other variants
     }

7. Old error variants cluttering up

There are many variants across different enums that have become unused and left over due to the improvements made in ibc-rs. These obsolete variants clutter the codebase and could be cleaned up.

8. Unnecessary fields under variants

There are instances where an error variant includes a field for a description or reason. Aside from the inconsistency in naming these fields, many of them are unnecessary and the relevant information could be incorporated into the docstring of each variant. For example:

     /// invalid client state max clock drift: `{reason}`
     InvalidMaxClockDrift { reason: String },  

This variant is only used in one place with the reason: “ClientState max-clock-drift must be greater than zero.” Instead, this context can be added directly to the docstring of the variant, like so:

    /// invalid client state max clock drift. It must be greater than zero.
    InvalidMaxClockDrift

9. Overusing Other variants losing clarity

There's an overuse of Other variants throughout the code, which undermines the purpose of defining specific error enum variants. We should review all instances of Other variants and replace them with more appropriate, clearly defined variants. If we decide to retain the Other variant, we should consider renaming it to something more meaningful, like Custom, or clearly document what it represents.

Highlight

We plan to address the issues mentioned above as part of the final set of breaking changes before the release of ibc-rs v1. We see these changes crucial for improving the consistency, clarity, and usability of our error handling. This will serve to make ibc-rs v1 more reliable while imposing fewer breaking changes afterward.