[RFC] Error handling in Rust runtime

awslabs / aws-lambda-rust-runtime

A Rust runtime for AWS Lambda

Apache License 2.0

3.33k stars 339 forks source link

[RFC] Error handling in Rust runtime #54

Closed sapessi closed 5 years ago

sapessi commented 5 years ago

Currently, the Lambda Rust runtime declares a HandlerError type that developers can use to return an error from their handler method. While this approach works, it forces developers to write more verbose code to handle all errors and generate the relevant HandlerError objects. Developers would like to return errors automatically using the ? operator (#23).

For example:

fn my_handler(e: CustomEvent, c: lambda::Context) -> Result<CustomOutput, Error> {
    let i = event.age.parse::<u8>?;
    ...
}

In an error response, the Lambda runtime API expects two parameters: errorType and errorMessage. Optionally, we can also attach a stack trace to the error.

{
    "errorMessage" : "Error parsing age data.",
    "errorType" : "InvalidAgeException"
}

To generate an error message field we need the Err variant from the handler Result to support the Display trait. This allows us to support any Display type in the result - Error types inherently supports Display too. However, we cannot identify a way to automatically generate the error type field given a Display-compatible object that uses stable features. To address this, we plan to introduce a new trait in the runtime:

pub trait ErrorExt {
    fn error_type(&self) -> &str;
}

We'd like to deprecate this trait in the future and rely on the type name intrinsic (which is currently blocked on specialization). For context, see #1428.

The name ErrorExt comes from the extension trait conventions RFC. Based on feedback, we are open to changing this.

The runtime crate itself will provide the implementation of the ErrorExt trait for the most common errors in the standard library. Developers will have to implement the trait themselves in their own errors. We may consider adding a procedural macro to the runtime crate to automatically generate the trait implementation.

In summary, the proposed changes are:

The handler type will accept any object that implements Display and ErrorExt in its Err variant.
The runtime crate will use the Display trait to extract an error message and use the ISO-8859-1 charset.
The runtime crate will call the error_type() function to get the value for the errorType field.
If the RUST_BACKTRACE environment variable is set to 1, the runtime will use the backtrace crate to collect a stack trace as soon as the error is received.

The new handler type definition will be:

pub trait Handler<Event, Output, Error> 
where 
    Event: From<Vec<u8>>,
    Output: Into<Vec<u8>>,
    Error: Display + ErrorExt + Send + Sync,
{
    fn run(&mut self, event: Event, ctx: Context) -> Result<Output, Error>;
}

davidbarsky commented 5 years ago

A friend (@myrrlyn, feel free to unsubscribe; credit where credit is due) suggested an alternative where the typename is an associated const on ErrorExt. For instance:

trait ErrorExt {
    const TY: &'static str;
    // ...
}
impl ErrorExt for Me {
    const TY: &'static str = "Me";
    // ...
}
fn handle<T, E: ErrorExt>(res: Result<T, E>) {
    if let Err(e) = res {
        println!("{}", E::TY);
    }
}

We might need to provide a derive on ErrorExt, however.

srijs commented 5 years ago

I‘ll try to expand on this later today, but here are some first thoughts and observations:

Use of ISO/IEC 8859-1: it‘s my understanding that Rust generally aims for full utf-8 support, and this seems like it would be more restrictive in this regard. Could you expand a bit on the rationale for this choice?
Error type inference and unification: A potential downside I can see with making the error generic in the Handler trait definition is that it has a couple of consequences with regards to the use of the ? operator.
- For one, ? does not only terminate early, but also attempts to convert the error using the Into trait. This means that if I were to use ? exclusively for error propagation in the handler, the compiler would not be able to infer the type of error that gets returned, forcing me to always add explicit type annotations.
- Secondly, even if the type inference could be fixed, a single error type is also less ergonomic when it comes to mixing and matching different error types in the implementation. This means that I always need to declare an enum that defines all possible errors that can be returned, which can be a bit of a nuisance.
Regarding ErrorExt, and how to get access to the error name, I think that‘s an acceptable solution. It‘s unfortunate that getting access to the name via intrinsics is blocked on specialization, as it would be a huge ergonomic improvement. The biggest issue I see with the extra trait is that Rust does not support orphan impls, meaning that third-party errors will be harder to support. Adressing the comment above, I do think that a method is better than an associated const, because it allows enums to generate different error names based on which variant is active.

srijs commented 5 years ago

Forgot to say this, but maybe it‘s worth taking a page out of the failure book (which got a lot of things right imo) and maintain the current concrete HandlerError type (analogous to failure::Error) and introduce an additional ErrorExt trait (analogous to failure::Fail):

trait ErrorExt: Display {
  fn error_type(&self) -> &‘static str;
}

impl<E> From<E> for HandlerError where E: ErrorExt {
  ... // store E::error_type into an internal field on HandlerError
}

trait Handler<E, O> {
  ... // keep the existing HandlerError based definition
}

This would allow anyone to use ? with any type that implements ErrorExt, converging on the HandlerError type.

davidbarsky commented 5 years ago

Thanks for the detailed feedback!

Use of ISO/IEC 8859-1: it‘s my understanding that Rust generally aims for full UTF-8 support, and this seems like it would be more restrictive in this regard. Could you expand a bit on the rationale for this choice?

I suspect that this was an oversight, as I was working on an alternate black-box approach to the Lambda runtime where the event and responses are defined in terms of Into<Vec<u8>> and From<Vec<u8>>. My bad.

Error type inference and unification: A potential downside I can see with making the error generic in the Handler trait definition is that it has a couple of consequences with regards to the use of the ? operator.

I'm guessing that if we provide the impl out of Failure's book as described in this comment, that should address most of the issues you've raised?

Also, I'm curious as to how you handle error reporting in srijs/rust-aws-lambda. Maybe we're missing something?

Regarding ErrorExt, and how to get access to the error name, I think that‘s an acceptable solution. It‘s unfortunate that getting access to the name via intrinsics is blocked on specialization, as it would be a huge ergonomic improvement. The biggest issue I see with the extra trait is that Rust does not support orphan impls, meaning that third-party errors will be harder to support. Addressing the comment above, I do think that a method is better than an associated const, because it allows enums to generate different error names based on which variant is active.

Yeah, that's one of the orphan implementations/type names were some of the biggest blockers we ran into. Not sure how to address them properly other than the situation above (or opening an RFC that introduces an fn error_type(&self) -> &‘static str on std::error::Error.

I do think that a method is better than an associated const, because it allows enums to generate different error names based on which variant is active.

That's an interesting point.

sapessi commented 5 years ago

Adding a couple of points to @davidbarsky's answer

I‘ll try to expand on this later today, but here are some first thoughts and observations:

Use of ISO/IEC 8859-1: it‘s my understanding that Rust generally aims for full utf-8 support, and this seems like it would be more restrictive in this regard. Could you expand a bit on the rationale for this choice?

This is more of a note about the HTTP interface. Your errors can definitely support full UTF-8 but last I checked the runtime interface for lambda expects body in ISO/IEC 8859-1 charset. The note was there to inform that we'll have to use that charset and some UTF-8 chars will not be represented correctly.

Error type inference and unification: A potential downside I can see with making the error generic in the Handler trait definition is that it has a couple of consequences with regards to the use of the ? operator.

For one, ? does not only terminate early, but also attempts to convert the error using the Into trait. This means that if I were to use ? exclusively for error propagation in the handler, the compiler would not be able to infer the type of error that gets returned, forcing me to always add explicit type annotations.

Secondly, even if the type inference could be fixed, a single error type is also less ergonomic when it comes to mixing and matching different error types in the implementation. This means that I always need to declare an enum that defines all possible errors that can be returned, which can be a bit of a nuisance.

You alternative solution is what I had been toying with:

pub struct HandlerError {
    msg: String,
}

impl<E: Display + Send + Sync + Debug> From<E> for HandlerError {
    fn from(e: E) -> HandlerError {
        HandlerError { msg: format!("{}", e) }
    }
}

Regarding ErrorExt, and how to get access to the error name, I think that‘s an acceptable solution. It‘s unfortunate that getting access to the name via intrinsics is blocked on specialization, as it would be a huge ergonomic improvement. The biggest issue I see with the extra trait is that Rust does not support orphan impls, meaning that third-party errors will be harder to support. Adressing the comment above, I do think that a method is better than an associated const, because it allows enums to generate different error names based on which variant is active.

Sh4rK commented 5 years ago

My main concern is:

If the RUST_BACKTRACE environment variable is set to 1, the runtime will use the backtrace crate to collect a stack trace as soon as the error is received.

This stack trace will point to somewhere inside the runtime, not anywhere useful in the user code, where the error originates from. Given this, I don't see how it would help.

For getting the type name, we could use the typename crate. It has impls for lots of std types, and it can be derived, so it's pretty easy to use.

I have my own idea on how the error handling should work, but I never fleshed it out. I'll try it and come back if I have something that seems to work :)

sapessi commented 5 years ago

My main concern is:

If the RUST_BACKTRACE environment variable is set to 1, the runtime will use the backtrace crate to collect a stack trace as soon as the error is received.

This stack trace will point to somewhere inside the runtime, not anywhere useful in the user code, where the error originates from. Given this, I don't see how it would help.

The idea is to generate it inside the From implementation, as soon as the error is received. You will certainly see some calls inside the runtime but it should hopefully also contain the customer code. I will test this.

For getting the type name, we could use the typename crate. It has impls for lots of std types, and it can be derived, so it's pretty easy to use.

The typename crate is interesting. However, giving users the ability to override the error type with a custom trait gives us more flexibility.

I have my own idea on how the error handling should work, but I never fleshed it out. I'll try it and come back if I have something that seems to work :)

Please update this issue if you have other suggestions.

Sh4rK commented 5 years ago

The typename crate is interesting. However, giving users the ability to override the error type with a custom trait gives us more flexibility.

They can do it just as easily by implementing the TypeName trait manually for their types. (That might however not actually be the "type name", so a custom trait can make sense.)

Sh4rK commented 5 years ago

Please update this issue if you have other suggestions.

I tried my idea, but unfortunately it doesn't seem to work, I can't get rid of some trait conflict errors. I suspect specialization would let me do what I want though.

srijs commented 5 years ago

@davidbarsky:

I'm guessing that if we provide the impl out of Failure's book as described in this comment, that should address most of the issues you've raised?

Yes, the example I provided solves both of these problems.

Also, I'm curious as to how you handle error reporting in srijs/rust-aws-lambda. Maybe we're missing something?

The decision I made there was basically 1) rely on the failure crate and use the Error type it provides for convergence and 2) hard-code the error type to "Error" until something like specialization comes around and makes more specific type names more ergonomic.

Relying on failure was useful in that I could use an error type that a large amount of the community was already familiar with, but on the flip side, things like failure::Error not exposing a structured backtrace forced me to write hacks like the backtrace-parser crate.

Yeah, that's one of the orphan implementations/type names were some of the biggest blockers we ran into. Not sure how to address them properly other than the situation above (or opening an RFC that introduces an fn error_type(&self) -> &‘static str on std::error::Error.

An RFC that proposes a method like that to std::error::Error (with a default impl for backwards compatibility) might be a good way to go in the longer term.

It also strikes me that there is some similarity between "error type", and the error kind mechanism that you can find in a few places like std::io::ErrorKind. If you are really considering opening an RFC to extend the std::error::Error trait, that might be another direction worth exploring.

For the short-term, not sure, the choices I can see right now are either hard-coding the error type like I did, or introducing an additional trait like ErrorExt and perhaps providing a macro like try_with_type!(err, "SomeType") that helps ease the pain of dealing with third-party errors that don't implement ErrorExt.

I appreciate that you probably have a desire to take advantage of the fact that the AWS Lambda client api supports separating error types from error messages, but it can't help wondering if that's worth the reduction in ergonomics.

Lastly, speaking as a co-maintainer of Rusoto, I'm not sure we'd want to introduce a direct dependency on lambda_runtime to "enhance" error reporting, but maybe the ErrorExt trait could move into a separate, lightweight crate with a stable API, so that it could be consumed from other crates such as Rusoto? (edit: just realized that the typename crate that @Sh4rK has been proposing goes into that direction somewhat, but I'd want to make sure we have rock solid API stability for that crate, and FWIW type names and error types still feel like slightly different concerns to me)

@sapessi: I've expanded a little bit on my example to illustrate how I might go about it (playground), but it feels like generally we're on the same page here about what that could look like 👍

use backtrace::Backtrace;

trait ErrorType: std::error::Error {
    fn error_type(&self) -> &'static str;
}

struct HandlerError {
    error: Box<ErrorType + Send + Sync>,
    backtrace: Backtrace
}

impl<E> From<E> for HandlerError where E: ErrorType + Send + Sync + 'static {
    fn from(err: E) -> Self {
        HandlerError {
            error: Box::new(err),
            backtrace: Backtrace::new()
        }
    }
}

sapessi commented 5 years ago

Closed it by mistake, reopening.

@davidbarsky:

I'm guessing that if we provide the impl out of Failure's book as described in this comment, that should address most of the issues you've raised?

Yes, the example I provided solves both of these problems.

Also, I'm curious as to how you handle error reporting in srijs/rust-aws-lambda. Maybe we're missing something?

The decision I made there was basically 1) rely on the failure crate and use the Error type it provides for convergence and 2) hard-code the error type to "Error" until something like specialization comes around and makes more specific type names more ergonomic.

Relying on failure was useful in that I could use an error type that a large amount of the community was already familiar with, but on the flip side, things like failure::Error not exposing a structured backtrace forced me to write hacks like the backtrace-parser crate.

Yeah, that's one of the orphan implementations/type names were some of the biggest blockers we ran into. Not sure how to address them properly other than the situation above (or opening an RFC that introduces an fn error_type(&self) -> &‘static str on std::error::Error.

An RFC that proposes a method like that to std::error::Error (with a default impl for backwards compatibility) might be a good way to go in the longer term.

It also strikes me that there is some similarity between "error type", and the error kind mechanism that you can find in a few places like std::io::ErrorKind. If you are really considering opening an RFC to extend the std::error::Error trait, that might be another direction worth exploring.

For the short-term, not sure, the choices I can see right now are either hard-coding the error type like I did, or introducing an additional trait like ErrorExt and perhaps providing a macro like try_with_type!(err, "SomeType") that helps ease the pain of dealing with third-party errors that don't implement ErrorExt.

I appreciate that you probably have a desire to take advantage of the fact that the AWS Lambda client api supports separating error types from error messages, but it can't help wondering if that's worth the reduction in ergonomics.

Lastly, speaking as a co-maintainer of Rusoto, I'm not sure we'd want to introduce a direct dependency on lambda_runtime to "enhance" error reporting, but maybe the ErrorExt trait could move into a separate, lightweight crate with a stable API, so that it could be consumed from other crates such as Rusoto? (edit: just realized that the typename crate that @Sh4rK has been proposing goes into that direction somewhat, but I'd want to make sure we have rock solid API stability for that crate, and FWIW type names and error types still feel like slightly different concerns to me)

They are meant to be different. Many customers stream logs from CloudWatch logs to an ELK/Splunk stack and rely on error types to generate their custom dashboard. I'd really like to give them the ability to set their custom types. As @Sh4rK discovered, without specialization, I cannot make it easy to implement this for all. If I declare a From<Display + Send + Sync> I cannot then declared a type-specific From.

@sapessi: I've expanded a little bit on my example to illustrate how I might go about it (playground), but it feels like generally we're on the same page here about what that could look like 👍
use backtrace::Backtrace;

trait ErrorType: std::error::Error {
    fn error_type(&self) -> &'static str;
}

struct HandlerError {
    error: Box<ErrorType + Send + Sync>,
    backtrace: Backtrace
}

impl<E> From<E> for HandlerError where E: ErrorType + Send + Sync + 'static {
    fn from(err: E) -> Self {
        HandlerError {
            error: Box::new(err),
            backtrace: Backtrace::new()
        }
    }
}

Yes, this is the approach I was taking. The errorType is the biggest question left. I really would like to include it. However, as you said, I also want it to be idiomatic and very easy for customers to use. I have nothing against the idea of splitting the error definition into a separate crate so that you can re-use it in rusoto.

Sh4rK commented 5 years ago

Just as a data point, an advantage of using failure::Error is that stack traces could originate from other crates that use failure, not just from code specifically written for Lambda.

sapessi commented 5 years ago

Thank you all for the comments so far. We experimented with the various methodologies on our side. Based on the results, we concluded that the best option is to rely on the failure crate. The proposed changes are:

The handler type in the runtime crate will accept an additional generic for errors. The generic will expect the following traits
```
impl<F, E, O, ER> Runtime<F, E, O, ER>
where
F: Handler<E, O, ER>,
E: serde::de::DeserializeOwned,
O: serde::Serialize,
ER: AsFail + LambdaErrorExt + Send + Sync + Debug,
```
In the process we will rename the generics to make the code more readable
The LambdaErrorExt trait defines the error_type() function to extract the parameter for the Lambda runtime API
```
pub trait LambdaErrorExt {
fn error_type(&self) -> &str;
}
```
We will move the LambdaErrorExt trait definition to a separate lambda-error crate - related to the comment on #53.
The lambda-error crate will also provide implementations of the LambdaErrorExt for all of the Error types in the standard library

With the changes outlined above, developers will have to depend on the failure crate for their functions. They can either use the concrete failure::Error type for simple functions or implement their own custom errors.

basic.rs

use failure::{bail, Error as FailError};

...

fn my_handler(e: CustomEvent, c: lambda::Context) -> Result<CustomOutput, FailError> {
    if e.first_name == "" {
        error!("Empty first name in request {}", c.aws_request_id);
        bail!("Empty first name");
    }

    Ok(CustomOutput {
        message: format!("Hello, {}!", e.first_name),
    })
}

custom.rs

use failure::Fail;

...

#[derive(Debug, Fail)]
#[fail(display = "custom error")]
enum CustomError {
    #[fail(display = "{}", _0)]
    ParseIntError(#[fail(cause)] std::num::ParseIntError),

    #[fail(display = "Generic, unknown error")]
    Generic,
}

impl LambdaErrorExt for CustomError {
    fn error_type(&self) -> &str {
        "MyCustomError"
    }
}

...

fn my_handler(e: CustomEvent, c: lambda::Context) -> Result<CustomOutput, CustomError> {
    if e.first_name == "" {
        error!("Empty first name in request {}", c.aws_request_id);
        return Err(CustomError::Generic);
    }

    let _age_num: u8 = e.age.parse().map_err(CustomError::ParseIntError)?;

    Ok(CustomOutput {
        message: format!("Hello, {}!", e.first_name),
    })
}

sapessi commented 5 years ago

Changes for this and #53 are staged in #63 - let me know what you guys think. I'd like to merge and prepare for a 0.2 release by the end of next week.

sapessi commented 5 years ago

Changes are merged and new release is making its way to crates.io