architect / functions

AWS Lambda Node runtime helpers for Architect apps
https://arc.codes
163 stars 38 forks source link

perf(http) Improve IsBinary checking performance #539

Closed gyx1000 closed 1 year ago

gyx1000 commented 1 year ago

Hello,

While browsing the remix project, I noticed that the control of the "Content-Type" header which allows to know if the body is binary could be improved. It is also possible to apply this change in this project.

Indeed, the check on the value of the header is done using an iteration on the array and a test for the presence of string with the Array.includes method. This implies a complexity of O(TypesCount*IncludesComplexity). It would be more efficient to use Set.has because the complexity would be O(1).

Related remix PR : https://github.com/remix-run/remix/pull/4761

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

ryanblock commented 1 year ago
// ./bench.js
let binaryTypes = require('./src/http/helpers/binary-types.js')
let header = 'hi'

console.time('some')
for (let i = 0; i < 1000000; i++) {
  let isBinary = binaryTypes.some(t => header.includes(t))
}
console.timeEnd('some')

console.time('has')
for (let i = 0; i < 1000000; i++) {
  let binaryTypesSet = new Set(binaryTypes)
  let isBinary = binaryTypesSet.has(header)
}
console.timeEnd('has')

// some: 288.125ms
// has: 1.645s
gyx1000 commented 1 year ago

@ryanblock You generate the set on each iteration. Could you try with the set defined outside the for loop ?

gyx1000 commented 1 year ago

@ryanblock With the Set defined outside I have this benchmark

some: 150.847ms has: 6.097ms

ryanblock commented 1 year ago

AWS Lambda provides a stateless execution model; requests may sometimes get an invocation from a warm Lambda, but when architecting Lambda-based apps a fresh state should generally be assumed upon each invocation. As such, the above code example is the truest representation of the Lambda invocation model.

That said, it is also true that the penalty here is in the conversion of the Array to the Set. This MIME type list is only used in one place, so we can avoid the penalty of that conversion by simply wrapping the src/http/helpers/binary-types-set.js export in a new Set([...; on my machine this requires at nearly the same speed (array: 0.662ms, set: 0.695ms). That seems to check all the efficiency boxes as far as I can tell!

ryanblock commented 1 year ago

Quick update: try the Array with binaryTypes.includes(header) instead of binaryTypesSet.has(header); this is, for me, even faster:

console.time('some')
for (let i = 0; i < 1000000; i++) {
  let isBinary = binaryTypes.some(t => header.includes(t))
}
console.timeEnd('some')

console.time('includes')
for (let i = 0; i < 1000000; i++) {
  let isBinary = binaryTypes.includes(header)
}
console.timeEnd('includes')

console.time('has')
for (let i = 0; i < 1000000; i++) {
  let isBinary = binaryTypesSet.has(header)
}
console.timeEnd('has')

// some: 282.091ms
// includes: 6.391ms
// has: 8.401ms
gyx1000 commented 1 year ago

Great, include is faster for me too. The difference with some is certainly the memory allocation for the arrow function.

I agree with the fact that the change will not bring much with cold lambda function.

ryanblock commented 1 year ago

Change published in v5.3.3