[RFC] Ergonomic improvements to payload deserialization for HTTP use cases

mlaota commented 3 weeks ago

Summary

This RFC proposes ergonomic improvements to payload deserialization for HTTP usecases.

Motivation

The RequestPayloadExt trait is used in lambda_http to simplify extracting a Request's body into a user-provided struct. The current version of the RequestPayloadExt::payload has the following docstring:

Return the result of a payload parsed into a type that implements serde::Deserialize

Currently only application/x-www-form-urlencoded and application/json flavors of content type are supported

A PayloadError will be returned for undeserializable payloads. If no body is provided, Ok(None) will be returned.

The interpretation of how payload should behave when the content-type header is missing/unsupported is somewhat ambiguous. For example, there's no mention of what happens when the content-type header is missing. According to RFC 9110: HTTP Semantics,

If a Content-Type header field is not present, the recipient MAY [...] examine the data to determine its type.

so, it's not unreasonable to assume that a generic payload function on an HTTP Request struct would also do a best-effort estimation of the content-type. The signature of payload asks to handle multiple PayloadError variants for different content types which may reinforce that assumption.

The actual behavior is that when the content-type header is either missing or unsupported, the implementation assumes there is no content and returns Ok(None); however, the docstring only calls out Ok(None) being returned "if no body is provided" which can lead to frustration when trying to figure out why the payload is not being recognized.

The documentation needs an update to disambiguate the scenario where the content-type is missing or unsupported. Additionally, I've written a proposal for API ergonomics improvements when handling different content types below.

Proposal: Intent-Driven API Additions

We can use this as an opportunity to introduce a fix via intent-driven additions. For example, we can add json() and x_www_form_urlencoded() methods to the Request that make it clear what the expected format should be:

// Before
let input: OperationInput = match event.payload() {
  Ok(Some(input)) => input,
  Ok(None) => 
    todo!(
      "is the data is missing or the content-type header is missing/incorrect? how 
      do I handle a default content-type for my app.. can i still use `payload` at all?
      i need to inspect the headers to figure out what to do next, and manually deserialize
      the JSON if the content-type header is missing"
    ),
  Err(PayloadError::Json(err)) => todo!("handle deserialization error"),
  Err(PayloadError(other_err)) => 
    todo!("still need to handle this")
}

// After
let input: OperationInput = match event.json() {
  Ok(Some(input)) => input,
  Ok(None) => todo!("handle missing data"),
  Err(JsonPayloadError(err)) => todo!("handle invalid json/schema"),
}

Option 1. (Preferred) Introduce a new extension trait, `RequestPayloadIntentExt`, with initial implementations

The extension trait would have the following interface:

pub type JsonPayloadError = serde_json::Error;
pub type XWwwFormUrlEncodedPayloadError = SerdeError;

pub trait RequestPayloadIntentExt {
  fn json<D>(&self) -> Result<Option<D>, JsonPayloadError> 
    where for<'de> D: Deserialize<'de>;

  fn x_www_form_urlencoded<D>(&self) -> Result<Option<D>, XWwwFormUrlEncodedPayloadError>
    where for<'de> D: Deserialize<'de>;
}

And initial implementations would go here:

impl RequestPayloadIntentExt for http::Request<Body> { TODO }

Advantages

Starting fresh. No risk for backwards compatability issues.
The currently available content types are documented via the API in the function names mapped to the content-type they examine for. No surprises.
Contributing is straightforward as users request additional content types; contributors should be able to recognize the pattern for contribution from the initial implementations of json() and x_www_form_urlencoded().

Drawbacks

Overlapping use case with RequestPayloadExt: users may be confused why the two implementations exist, and surprised when they behave differently
- Mitigation: Mark RequestPayloadExt as deprecated in documentation with a pointer to RequestPayloadIntentExt

Option 2: Extend RequestPayloadExt trait with methods `json()` and `x_www_form_urlencoded()`

The trait could be modified as follows:

pub type JsonPayloadError = serde_json::Error;
pub type XWwwFormUrlEncodedPayloadError = SerdeError;

pub trait RequestPayloadExt {
  ...

  fn json<D>(&self) -> Result<Option<D>, JsonPayloadError> 
    where for<'de> D: Deserialize<'de> 
  {
      todo!("delegate to self.payload for backwards compatability")
  }

  fn x_www_form_urlencoded<D>(&self) -> Result<Option<D>, XWwwFormUrlEncodedPayloadError>
    where for<'de> D: Deserialize<'de> 
  {
      todo!("delegate to self.payload for backwards compatability")
  }
}

impl RequestPayloadExt for http::Request<Body> {
  fn json<D>(&self) -> Result<Option<D>, JsonPayloadError> 
    where for<'de> D: Deserialize<'de> {
      todo!("new implementation")
    }

  fn x_www_form_urlencoded<D>(&self) -> Result<Option<D>, XWwwFormUrlEncodedPayloadError>
    where for<'de> D: Deserialize<'de> {
      todo!("new implementation")
    }
}

Advantages

Users will find the new functionality where they expect it (from prior examples and discussions in the repo)
Maintaining one trait for payload deserialization utils.
payload can reuse the implementations for json and xx_www_form_urlencoded and share in improvements.

Drawbacks

Backwards compatibility: If a user created an implementation of RequestPayloadExt, they would need to then implement json and x_www_form_urlencoded.
- Mitigation 1: we can provide default implementations on the trait delegating to self.payload to mimic the previous behavior.
Confusing namespace: we would now have variants such as PayloadError::Json and JsonPayloadError coexisting in the same module.
- Mitigation: Better names? Open to suggestions.
Inconsistent behavior/assumptions between payload and new methods

Alternative: Try all content types deserializers before returning None

Since both current (and I'm assuming future) content-type deserialization implementations leverage Serde's Deserialize trait as inputs, we can try both deserializers on the input when the content-type header is missing. For example, we can start with JSON, catch errors, then try x-www-form-urlencoded, etc...

Advantages

Should be fairly simple to implement, with no changes to the trait's API.

Drawbacks

If users rely on the function returning None when the "content-type" header is empty, they may start handling requests that they expected to reject when the header is missing; consequently, they may be surprised at the new behavior and/or be forced to implement a different mechanism for denying requests without a content-type header.
Performance may not be great due to trying every content-type.

calavera commented 3 weeks ago

I would vote for option 2. I'd rather not introduce new extensions. I believe not many people know about them and that you can implement your own.

Regarding error types, why do we need new error types for that PayloadError doesn't cover? The errors that you're creating are pretty much the same as the variants in that enum.

mlaota commented 3 weeks ago

I would vote for option 2. I'd rather not introduce new extensions. I believe not many people know about them and that you can implement your own.

Sounds good.

Regarding error types, why do we need new error types for that PayloadError doesn't cover?

Good question -- the goal is that users should only be forced to handle errors that are thrown by their intended content-type deserializer. PayloadError works great with the current behavior in the case where the deserializer is selected at run-time, since it's really the client who chooses it based on the content-type header.

If we flip that decision to the server with this new API, we can strongly guarantee that only related error types will be thrown by using a more specific error type; so, when I'm calling .json(), I no longer need to consider the possibility of a PayloadError::WwwFormUrlEncoded error, because the compiler guarantees that won't happen; whereas, if we reuse PayloadError, I still have to write a handler for other content types even if I'm explicitly using the json() method, and the addition of any other content type would be a breaking change (e.g. if we add PayloadError::TextHtml, any user who fully matches the PayloadError type now has to add one more arm).

The errors that you're creating are pretty much the same as the variants in that enum.

I'm open to feedback on this. This was basically my thought process:

the extension already has some coupling with serde, which is okay imo given its popularity
it seems like serde_json::error::Error provides a lot of rich information so we shouldnt' need additional error scenarios layered on top
- if this is a bad assumption, maybe we should use an enum to be safe?

calavera commented 2 weeks ago

That makes sense. I have no naming preference for the error types, tbh. Feel free to open a PR with these changes and we can discuss it further in code review.

mlaota commented 1 week ago

PR above is ready for review.

(Sorry for the mention spam there, that was because of git amends + pushes to work on different laptops lol)

github-actions[bot] commented 1 week ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.

awslabs / aws-lambda-rust-runtime