Open misterjoshua opened 2 years ago
Engines:
Existing lambda middleware:
Thanks a lot for those inputs ! We will look into it
@misterjoshua Thanks a lot. This is very nice RFC and rich in details.
We plan to catch up with Python feature so this is in our roadmap. The team is focusing in fixing all outstanding issues on 3 core utilities to have Production Ready version. So it may take a while to implement this. However, I would like to use this chance to discuss the feature API so we have a clear spec, ready for implementation.
This feature is comparable to the Validation
utility in Python version. The main difference is that Python has top-level function decorator, but TypeScript doesn't have that. This is the same challenge we had in core utility design(tracer, logger, metrics).
Let's explore the alternatives so we can weight pros & cons:
@misterjoshua What do you think of this alternative?
import { Validation, Envelope, validateSchema } from "@aws-lambda-powertools/validation";
const validation = new Validation({
envelope: Envelope.jmesPath("detail"), // Optional: only if we want to validate a part of it
inboundSchema: {...},
outboundSchema: {...}, //optional
});
// Copied from proposal
//....
async function handlerBase(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
return {
message: `Hello ${request.name}`,
};
}
//...
export const handler = middy(handlerBase)
.use(validateSchema()
.use(..); // can chain with Logger's injectContext() or any other utilities
The advantage is that it's easier to chain multiple utility features. In your proposal, if we also want to use injectLambdaContext(logger)
, we will need to wrap Validation
object with another function (or class) like this:
export const handler = Tracer.captureLambdaHandler(Logger.injectContext(
Validation.handler(handlerBase, {...});
)));
We can follow the same pattern of core utilities.
import { Validation } from `....`;
const validation = new Validation({
envelop: ...;
inboundSchema: ...;
outboundSchema: ...;
});
class Lambda implements LambdaInterface {
// Decorate your handler class method
@validation.validateHandler()
public async handler(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
/* ... */
}
}
This is where I see no clear winner. Python version have only a validate()
function.
The Validator
class like in your ad-hoc example looks good to me. Another alternative is function based like in Python
const validationOption = {
envelop: ...;
inboundSchema: ...;
outboundSchema: ...;
};
try {
const person: Person = validate<Person>(JSON.parse(someInput), validationOption);
logger.info(`Person's name is ${person.name}`);
} catch (e) {
logger.error(`Failure: ${e}`);
}
That being said using Validator
class also makes a lot of sense to me too. I'm wondering if I miss any pros & cons here.
outboundSchema
required parameter? Error
describing that there is an incorrect format.
Logger
but that means another dependency) Should we provide an optional log function with console.error()
as a default?middy
has already had a few Validator middlewares . The only difference I see is JMES support. If we provide an extractor for JMESPath (like in Python) We could reuse existing ones. This makes me concerned if we are reinventing a wheel here. (But the same could be said for all other utilities)Usage 1: Middy middleware
@ijemmy I like the middleware idea - it's familiar to many node devs. I have a few thoughts on the example above:
validateSchema
would have a parameter that accepts your Validation
instance?handlerBase
wants GreetingRequest
- would the validateSchema
middleware strip the envelope?Envelope
type that allows users to create a multi-purpose lambda handler - one handler that works for any AWS event type that includes payload data. (SNS, SQS, Event Bridge, or Kinesis Streams for example.) Users could then create lambdas that don't need to pay much attention to identifying and unpacking the structure of the envelopes for each payload.What about something like this? (EDIT: I have added some discussion near the bottom of this comment about separating Validation from Envelope as another decorator and middleware to be applied before validation.)
import { Validation, Envelope, validateSchema } from "@aws-lambda-powertools/validation";
// Composable, multi-purpose envelope-stripping example.
const envelope = Envelope.multiPurpose({
envelopes: [
// Accepts the given detail type and unwraps `details`
Envelope.awsEventBridgeEvent({ detailType: 'Detail Type Name' }),
// Flat maps SQS event `Records[*].body` to parsed json and feeds each
// through `handlerBase` when the entire event is valid.
Envelope.awsSqsEvent({ jsonPayload: true, flatMapRecords: true }),
// Same idea as SQS, but for SNS and `Records[*].Sns.Message`
Envelope.awsSnsEvent({ jsonPayload: true, flatMapRecords: true }),
],
});
const validation = new Validation({
envelope: envelope, // Replaced multi-purpose envelope example
inboundSchema: {...},
// Commented out for the sake of this example:
// outboundSchema: {...},
});
// Multi-purpose handler - it doesn't need to know how it's getting GreetingRequest, whether it's from
// SNS, SQS, or Event Bridge.
async function handlerBase(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
return {
message: `Hello ${request.name}`,
};
}
export const handler = middy(handlerBase)
.use(validateSchema(validation)) // This example assumes Middy can flat map SQS/SNS batch records.
.use(...);
Usage 2: Class decorator
This looks good. I like the decorator syntax. Here's an example of the same multi-purpose handler as above:
import { Validation, Envelope } from "@aws-lambda-powertools/validation";
// Multi-purpose envelope unwrapping example (same as previous example)
const envelope = Envelope.multiPurpose({
envelopes: [
Envelope.awsEventBridgeEvent({ detailType: 'Detail Type Name' }),
Envelope.awsSqsEvent({ jsonPayload: true, flatMapRecords: true }),
Envelope.awsSnsEvent({ jsonPayload: true, flatMapRecords: true }),
],
});
const validation = new Validation({
envelope: envelope, // Replaced multi-purpose envelope example
inboundSchema: {...},
// Commented out for the sake of this example:
// outboundSchema: {...},
});
class Lambda implements LambdaInterface {
// Decorate your handler class method
@validation.validateHandler()
public async handler(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
/* ... */
}
}
Usage 3: manual
This is where I see no clear winner. Python version have only a
validate()
function.
I agree. Validator.map
departs from the "validate function" approach in both the Python and Java libs. To maintain consistency, your syntax looks better. I especially like the idea that the validate function returns typed data from the JSON.parse()
. We encounter a need for this type of function when working with data returned from the AWS SDK. (SecretsManager secrets in JSON format and DynamoDB Document Client items come to mind.)
Other points to consider:
1. Should we make `outboundSchema` required parameter?
I think not. For backend lambdas, the output isn't always important. (i.e., Lambdas handling SQS, SNS, or Event Bridge Rule Targets.)
2. If inbound validation fails, it fails fast by throwing an `Error` describing that there is an incorrect format. * Should we have an option to control the level of info we disclose here?
Yes, I think so. For API Gateway, I can envision a scenario where I want to create an entirely custom error response with a particular HTTP status code. Perhaps Validator
can accept a validationErrorHandler
option to run when validation fails. Example:
async function validationErrorHandler(error) {
return {
statusCode: 418, // HTTP 418 I'm a teapot
message: 'My custom message here',
};
}
const validation = new Validation({
inboundSchema: {...},
outboundSchema: {...},
validationErrorHandler: validationErrorHandler,
});
// Provide `validation` to middleware or use the decorator approach.
3. If outbound validation fails, returns a 500 error. * How do we log the error in this case? (we can use `Logger` but that means another dependency) Should we provide an optional log function with `console.error()` as a default?
Given that Tracer, Logger, and Metrics already depend on Commons, could we add something like ClassThatLogs
) to Commons? If we do, we can optionally accept ClassThatLogs
in Validator
with a fallback to some default implementation.
const logger = new Logger(...);
const validation = new Validation({
inboundSchema: {...},
outboundSchema: {...},
logger: logger,
});
* Or we let users handle this themselves with manual option
If we put ClassThatLogs
in Commons, the user can provide a custom implementation.
4. `middy` has already had a few Validator middlewares . The only difference I see is JMES support. If we provide an extractor for JMESPath ([like in Python](https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/jmespath_functions/)) We could reuse existing ones. This makes me concerned if we are reinventing a wheel here. (But the same could be said for all other utilities)
I see value in having the decorator version of the validation even if we don't add a middy middleware due to overlap in their ecosystem... But I don't see how adding a middleware for Middy would add much complexity if we're already adding a decorator version, since the majority of the logic would live in Validator
, no?
EDIT: I was just thinking that at a certain point, I'm not sure if the idea of multi-purpose lambdas is just a validation story - it could be two stories: Validation and Extraction. I figure it should be possible to separate Validation and Extraction into two distinct features. Extraction could just as well be its own middleware & decorator, perhaps centred around the concept of Envelope
. I imagine that could look like this:
// Composable, multi-purpose envelope-stripping example.
const envelope = Envelope.multiPurpose({
envelopes: [
// Accepts the given detail type and unwraps `details`
Envelope.awsEventBridgeEvent(),
// Flat maps SQS event `Records[*].body` to parsed json and feeds each
// through `handlerBase` when the entire event is valid.
Envelope.awsSqsEvent({ jsonPayload: true, flatMapRecords: true }),
// Same idea as SQS, but for SNS and `Records[*].Message`
Envelope.awsSnsEvent({ jsonPayload: true, flatMapRecords: true }),
],
});
const validation = new Validation({
inboundSchema: {...},
// Commented out for the sake of this example:
// outboundSchema: {...},
});
// Multi-purpose handler - it doesn't need to know how it's getting GreetingRequest, whether it's from
// SNS, SQS, or Event Bridge.
async function handlerBase(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
return {
message: `Hello ${request.name}`,
};
}
export const handler = middy(handlerBase)
.use(extractFromEnvelope(envelope)) // This example assumes Middy can flat map SQS/SNS batch records.
.use(validateSchema(validation))
.use(...);
Here's use case 2:
// Composable, multi-purpose envelope-stripping example.
const envelope = Envelope.multiPurpose({
envelopes: [
// Unwraps `details`
Envelope.awsEventBridgeEvent(),
// Flat maps SQS event `Records[*].body` to parsed json and feeds each
// through `handlerBase` when the entire event is valid.
Envelope.awsSqsEvent({ jsonPayload: true, flatMapRecords: true }),
// Same idea as SQS, but for SNS and `Records[*].Message`
Envelope.awsSnsEvent({ jsonPayload: true, flatMapRecords: true }),
],
});
const validation = new Validation({
inboundSchema: {...},
// Commented out for the sake of this example:
// outboundSchema: {...},
});
class Lambda implements LambdaInterface {
// Decorate your handler class method
@envelope.extractForHandler()
@validation.validateHandler()
public async handler(request: GreetingRequest, context: lambda.Context): Promise<GreetingResponse> {
/* ... */
}
}
I'm happy to open another RFC focused on Extraction, if this is something that could work.
Middy takes this approach of separating normalization
(envelope/Extraction) and validation
. See AWS Event support list, we're just missing API Gateway WebSocket events which will be its own middleware at some point. I haven't heard of anyone requesting the ability to flatMapRecords
, but it could be added pretty easily if someone needed it.
If AWS provided the JSON schemas for each event in the documentation, that would be super helpful. I've thought about writing these myself, but i'd just be reverse engineering from the samples and may not be accurate.
@misterjoshua
I presume that validateSchema would have a parameter that accepts your Validation instance?
That's right.
Also, handlerBase wants GreetingRequest - would the validateSchema middleware strip the envelope?
I'm inclined to have two different middlewares for different tasks.
I wonder if there's an opportunity here to validate that the envelope is right. e.g., to check that a lambda meant to be an Event Bridge Rule Target received Event Bridge events and not SQS, SNS, or API Gateway events.
Is it something similar to this Built-in envelops in Python version?
If yes, this should be included in the spec.
And I wonder if it's possible to provide an Envelope type that allows users to create a multi-purpose lambda handler - one handler that works for any AWS event type that includes payload data. (SNS, SQS, Event Bridge, or Kinesis Streams for example.) Users could then create lambdas that don't need to pay much attention to identifying and unpacking the structure of the envelopes for each payload.
Can you tell me more about your business use cases?
I haven't seen many use cases of consuming the same data from different AWS source.
Finally, I wonder whether Middy middleware can flat map records from batch event types. If it can, this plays well with the story of creating a multi-purpose lambda function.
I think this is out of scope for Validation. I would go with an extra line of _.flatMap()
. If you want it in middleware, you can create a custom middle ware with that.
Given that Tracer, Logger, and Metrics already depend on Commons, could we add something like ClassThatLogs) to Commons? If we do, we can optionally accept ClassThatLogs in Validator with a fallback to some default implementation.
That could be a good option. The price we pay is leaking abstraction between modules.
I see value in having the decorator version of the validation even if we don't add a middy middleware due to overlap in their ecosystem... But I don't see how adding a middleware for Middy would add much complexity if we're already adding a decorator version, since the majority of the logic would live in Validator, no?
No, it won't add much complexity.
Middy takes this approach of separating
normalization
(envelope/Extraction) andvalidation
. See AWS Event support list, we're just missing API Gateway WebSocket events which will be its own middleware at some point. I haven't heard of anyone requesting the ability toflatMapRecords
, but it could be added pretty easily if someone needed it.If AWS provided the JSON schemas for each event in the documentation, that would be super helpful. I've thought about writing these myself, but i'd just be reverse engineering from the samples and may not be accurate.
Wow, this is pretty good. I'm going to use that in my next project :)
Regarding JSON schemas, let me pass this feedback to the Lambda team.
I'd like to add another validation engine to be considered: zod. IMO it has the best TypeScript DX between ajv and joi.
EDIT: When I made the above comment, I was not aware of the parser utility tracked in #1334. Personally, I probably won't use the validation package as the parsing validation does validation and parsing and I'd rather have them all as one.
Hi everyone, as indicated in our roadmap we plan to implement this utility in the coming months.
Over the next few days I'll take some time to go through the existing content of the RFC and consolidate everything.
Within this effort I'd like to see a pattern built for creating decorators and not just middleware - if possible.
Hi @codyfrisch, thanks a lot for taking the time to share this detail/request.
We are planning on including decorator usage as part of this utility.
When we released the Idempotency Utility we held back the decorators because we had some open questions around TypeScript 5.x support. However last week we finally transitioned successfully and stabilized the topic, so we are ready to develop new decorators for all the utility that make sense, this one being one of them.
Hi there!
We discussed this RFC with @dreamorosi and how to move it forward, so as the first step I did a little research about JSON validation packages, that could be used as a wrapper dependency for the powertools/validation
and I want to share the results.
The first thing that comes to mind is ajv
, but let's look at other options.
TLDR
schemasafe
, because it's small in size, performant, has zero dependencies, is relatively popular, maintained by a company, and the main contributor is a member json-schema.org
organization on GitHub. MIT license.tdegrunt/jsonschema
is small in size, has zero dependencies, 6th place at the top of performance benchmarks, and has a relatively large amount of downloads. MIT license.I am on the fence about zod
, I think it serves a different purpose, It can be used to validate JSON data, but it is not designed specifically for validating JSON schemas. Maybe I'm missing something, but I would like to hear more from somebody who worked with it.
The full comparison table is below.
Package | Maintenance | Performance benchmarks | Weekly Downloads | Unpacked Size | Dependencies | Used by projects | Drafts support | License | Documentation | Featured on json-schema.org |
---|---|---|---|---|---|---|---|---|---|---|
ajv | Maintained by community, most contributions by one person (member of json-schema org), project sponsored by Mozilla / Microsoft | draft7 – top 2, draft6 - top 1, very performant | 97m | 1.02 MB | 4 | 22.3m | 2020-12 2019-09 07 06 04 |
MIT | ajv.js.org | ✅ |
schemasafe | Maintained by Exodus, most contributions by one person (member of json-schema org) | draft7 top 1, draft6 top 2, very performant, performance page in the docs | 1m | 139 kB | 0 | 27.7k | 2020-12 2019-09 07 06 04 |
MIT | GitHub | ✅ |
json-schema-library | Maintained by one person | draft7 – top 5, somewhat performant | 39k | 511 kB | 7 | 885 | 07 06 04 |
MIT | GitHub | ✅ |
jsonschema | Two active maintainers | both draft6 and draft7 – top 6, somewhat performant | 2.3m | 81.8 kB | 0 | 253k | versions through draft-07 are fully supported |
MIT | GitHub | ❌ |
@hyperjump/json-schema | Developed by one person | No benchmarks (project started in 2022) | 13k | 356 kB | 8 | 284 | 2020-12 2019-09 07 06 04 |
MIT | GitHub | ✅ |
zod | Maintained by community, most contributions by one person, the project has sponsors | No benchmarks | 4.9m | 628 kB | 0 | 643k | ??? | MIT | zod.dev | ❌ |
I also looked at djv, json-schema, is-my-json-valid, and a few others, and decided not to add them in the table, because some of them aren't maintained, outdated, not performant, or serve different purpose.
@shdq, I edited my comment above regarding zod. I now (after learning about parsing package) don't think it should be considered for validation.
With that said, I do think the distinction between validation and parsing is a nuanced once that many will not be aware of. It would be valuable to have a callout in the docs for validation that if you're looking for validation+parsing with zod, please see the parsing package.
I'd like to dive a bit deeper into the two top options: ajv
& schemasafe
.
From the linked benchmarks I wasn't sure about where the "unpacked size" came from, so I decided to test them a more production-like environment and see what's the final size. To do so, I have created a simple Lambda function that: 1/ imports the schema validation dependency, 2/ initializes it, 3/ provides a simple schema, and 4/ performs a validation.
The function is intentionally left as small as possible & without any Powertools utility so that we can assess the final size of the dependency almost in isolation. Below you can find the code of the functions.
Additionally, I'm bundling the function using esbuild
via the NodejsFunction
CDK construct. I'm doing this in a matrix of 4 modes:
In all cases I am generating a metafile so that we can run it through the esbuild analyzer and see what it contains and its final size, plus ESM vs CJS distribution.
Below you can see the code used to deploy/bundle.
From the results below we can observe a few things:
ajv
nor schemasafe
ship ESM distributions, this means they can't be tree-shaken effectively - this is good for the comparison as it levels the playing field but not amazing for Powertools overallschemasafe
footprint is consistently ~55% smaller than ajv
Package | CJS | CJS optimized | ESM | ESM optimized |
---|---|---|---|---|
Ajv | 263.8kb | 123.1kb | 263.3kb | 122.9kb |
schemasafe | 120.4kb | 55.9kb | 120.0kb | 55.7kb |
Below a breakdown of each result, you can get the metafile generated by each test in the details
section, paste it into a file, and run it through the analyzer if you want to see an interactive version of the chart.
In terms of performance, looking at the benchmarks linked above, the main thing that I noticed is that they're running on potentially outdated versions of these libraries. For example, ajv
had two major releases since the benchmark was run and schemasafe
had two minor versions but the version tested in the benchmark was a release candidate for its v1 major.
I have cloned the repo, then ran npm ci
, followed by npm run update
which updated all the test suites and packages to the respective latest versions. Finally, I ran npm t
.
After the first run I realized that ajv
was consistently being excluded by the benchmark, and a bit of digging I saw that it also had a large amount of errors (which caused it to be excluded from the result). Looking at the PRs in the benchmarks repo I found this one that showed an updated version of ajv
and that loaded the schemas differently, which I did.
At this point I was able to run the benchmarks successfully and the results are consistent with the original ones in terms of ordering albeit with a much smaller difference between the two.
Overall it appears that even though ajv
remains the fastest, @exodus/schemasafe
is about ~10% slower on schema draft 6 and draft 7.
Another interesting detail to notice is that when it comes to errors thrown during the validation, ajv
shows a higher number of errors. This however could also be attributed to the package being generally stricter in its validation. As a matter of fact, to even run the benchmark at all the strict
and validateFormats
options had to be disabled.
Below the full breakdown of the results.
[!note] Note that the tests above have been run on my laptop and not in Lambda, so the results might be different once ran there.
Performance benchmark for Node.js JSON-schema validators.
Also tests against official JSON-schema test suite, version draft7. and checks for validators that cause side-effects on schema or data. The top 6 validators that fail the least tests are included in the benchmark.
Contribute to these benchmarks
Validator | Relative speed | Number of test runs per second |
---|---|---|
ajv |
100% | 36439 (± 0.49%) |
@exodus/schemasafe |
90.5% | 32989 (± 0.25%) |
is-my-json-valid |
61.1% | 22267 (± 0.64%) |
djv |
20% | 7286 (± 0.29%) |
@cfworker/json-schema |
2.9% | 1065 (± 1%) |
jsonschema |
1.1% | 389 (± 1.02%) |
1049 tests are run in each test run.
Validators tested: @cfworker/json-schema (1.12.7)
, jsonschema (1.4.1)
, @exodus/schemasafe (1.3.0)
, ajv (8.12.0)
, djv (2.1.4)
, is-my-json-valid (2.20.6)
, jsen (0.6.6)
, tv4 (1.3.0)
, jassi (0.1.2)
, jjv (1.0.2)
, z-schema (6.0.1)
, request-validator (0.3.3)
, json-schema-validator-generator (1.1.11)
, themis (1.1.6)
, JSV (4.0.2)
, json-model (0.2.24)
, jsck (0.3.2)
, skeemas (1.2.5)
, schemasaurus (0.7.8)
, json-gate (0.8.23)
, revalidator (0.3.1)
,
(validators not in the results above where excluded because of failing tests - see below for details)
ajv
is currently the fastest JSON-schema validator out there.
This test suite uses the official JSON-schema test suite, version draft7.
If a validator does not pass a test in the official test suite, it will show up in these results.
Validator | Number of failing tests (click for details) |
---|---|
@cfworker/json-schema |
48 |
jsonschema |
50 |
@exodus/schemasafe |
105 |
ajv |
118 |
djv |
161 |
is-my-json-valid |
164 |
jsen |
202 |
tv4 |
212 |
jassi |
229 |
jjv |
231 |
z-schema |
249 |
request-validator |
264 |
json-schema-validator-generator |
273 |
themis |
275 |
JSV |
291 |
json-model |
291 |
jsck |
341 |
skeemas |
365 |
schemasaurus |
378 |
json-gate |
420 |
revalidator |
449 |
Some validators have deliberately chosen not to support parts of the spec. Go to the homepage of the validator to learn if that is the case for these tests.
Performance benchmark for Node.js JSON-schema validators.
Also tests against official JSON-schema test suite, version draft6. and checks for validators that cause side-effects on schema or data. The top 6 validators that fail the least tests are included in the benchmark.
Contribute to these benchmarks
Validator | Relative speed | Number of test runs per second |
---|---|---|
ajv |
100% | 37486 (± 0.61%) |
@exodus/schemasafe |
95.9% | 35938 (± 0.34%) |
is-my-json-valid |
62.1% | 23265 (± 0.21%) |
djv |
19.7% | 7392 (± 0.97%) |
@cfworker/json-schema |
3.1% | 1155 (± 1.28%) |
jsonschema |
1% | 390 (± 0.95%) |
884 tests are run in each test run.
Validators tested: @cfworker/json-schema (1.12.7)
, @exodus/schemasafe (1.3.0)
, jsonschema (1.4.1)
, ajv (8.12.0)
, djv (2.1.4)
, is-my-json-valid (2.20.6)
, jsen (0.6.6)
, tv4 (1.3.0)
, z-schema (6.0.1)
, jassi (0.1.2)
, jjv (1.0.2)
, themis (1.1.6)
, request-validator (0.3.3)
, json-schema-validator-generator (1.1.11)
, json-model (0.2.24)
, jsck (0.3.2)
, JSV (4.0.2)
, schemasaurus (0.7.8)
, skeemas (1.2.5)
, json-gate (0.8.23)
, revalidator (0.3.1)
,
(validators not in the results above where excluded because of failing tests - see below for details)
ajv
is currently the fastest JSON-schema validator out there.
This test suite uses the official JSON-schema test suite, version draft6.
If a validator does not pass a test in the official test suite, it will show up in these results.
Validator | Number of failing tests (click for details) |
---|---|
@cfworker/json-schema |
8 |
@exodus/schemasafe |
12 |
jsonschema |
15 |
ajv |
69 |
djv |
103 |
is-my-json-valid |
111 |
jsen |
145 |
tv4 |
155 |
z-schema |
168 |
jassi |
172 |
jjv |
173 |
themis |
189 |
request-validator |
207 |
json-schema-validator-generator |
216 |
json-model |
225 |
jsck |
227 |
JSV |
234 |
schemasaurus |
237 |
skeemas |
239 |
json-gate |
342 |
revalidator |
384 |
Some validators have deliberately chosen not to support parts of the spec. Go to the homepage of the validator to learn if that is the case for these tests.
Another thing to notice, that I must admit gives me pause, is the fact that @exodus/schemasafe
appears to have over 75 collaborators on npm (see collaborators section on the package page). If I understand correctly the field it means that all these people can publish a new version of the package, which can be problematic. I have opened an issue on their repo to clarify if this is the case.
Worth noting that ajv supports compiling the JSON Schema into ESM (with treeshaking), not represented in the above tests. The resulting bundle is smaller than all the others (from testing a few years ago) with this approach, but requires an extra build step. This is what Middy recommends, though can't comment on how many people choose this option.
https://middy.js.org/docs/middlewares/validator
Additional bonuses of ajv is that is has translations of errors into multiple languages, making it more inclusive.
One aside from me. We're using ajv in one particular internal project. The schema compilation step caused some very non-trivial cold start delays for us because we were compiling all schemas at startup, even if we weren't going to use the validation function in the lifetime of the given Lambda invocation. We had better luck compiling and caching validators just in time.
@willfarrell thanks for the info, any chance you could link to an example or docs that show how to do this? I've tried to search their repo but the only relevant pieces I could find are this setting and this discussion which seem to point to the fact that this setting influences only the code generation/compilation.
I'm not yet 100% familiar with this project and space, so I might completely be off base, but this means that using this setting you can influence only the code generated by ajv. This kinda makes sense in the frame of my mental model as I've never seen dynamic runtime switching of imports. Whether a module is interpreted as CJS or ESM is decided (roughly) based on the way it's exported both in its source and in its package.json
. With this in mind I don't understand what's missing.
But regardless of the above, I have tried to run the build test again setting the const ajv = new Ajv({ allErrors: false, code: { esm: true } });
and I got the same exact bundle out:
I also ran the unminified outputs of the bundles with esm
set to true
& false
through a diff checker, and the only difference was the constructor, the rest of the code was exactly the same. Again, not sure what this setting really does - a link or example would be super helpful.
@misterjoshua That's good info. I'd like to run some tests on Lambda next, I'm just not sure what would be a representative sample to show performance short of reworking the entire benchmark suite to call Lambda functions rather than run the tests locally.
You're in the right area. I create a package to allow other extension to work with ESM called ajv-cmd
(referenced in the middy docs). The resulting js can be a little as <1kb depending on the JSON schema itself.
@dreamorosi In my understanding ajv
does two steps compilation
and validation
. The performance comes from using/reusing compiled schemas (cached as hashmaps).
So you have two options, compile schema everytime in runtime or precompile it as a standalone code. This is what this code: { esm: true }
option is about. I don't think the library itself supports ESM.
For standalone to have ESM/CJS you can generate a function from the precompiled schema + validation function – https://ajv.js.org/standalone.html. You create it as a bundle with a CLI (or a wrapper ajv-cmd
mentioned above) or with the library itself and then import it into your code.
Pros: You have this tiny bundle and ESM support (if needed). Cons: extra build step + you need your schema beforehand. As a library, we don't know the schema, like an end user. So we need to teach them how to do that.
As a solution, it can be described in docs about how to create a schema bundle or add a guide on how to do an extra step in esbuild
. And adjust the API to take this function as an argument. But it's more like performance fine-tuning. Which is good for lambda.
But some users may not want to do the extra build step, so we need to have ajv
as a dependency anyway. Like @misterjoshua use case, I assume they do both compile
and validate
inside their lambdas.
Description of the proposal
There should be a consistent way to validate the structures of inputs and outputs for AWS lambdas.
Name of the core utility (and consequently the folders that contain the libraries):
packages/validation
Justification
The "Garbage in, garbage out" principle suggests that we should validate our inputs to prevent "garbage in," and for Lambda, those inputs include events passed to the lambda handler. A short, informal survey of GitHub projects using the CDK's NodejsFunction found that 90% of the studied projects hand-code their validation, indicating little consensus on any singular tool to validate inputs.
Additionally, the Java and Python toolkits saw fit to add validation of not only lambda input but lambda responses.
Validation support should be added to work toward feature parity between the AWS Lambda Powertools libraries and to offer developers a good alternative to hand-coding their validation.
Goals
Proposed API
Installation
Yarn
Npm
Usage
Lambda Handler
Ad-hoc validation
Survey Results
Obtained by searching GitHub code for NodejsFunction in TypeScript projects. The repositories are from the first twenty results, sorted by "best match." These cases are good enough evidence for me, but perhaps the community can suggest a better way to get this information.