Closed IhostVlad closed 5 years ago
@IhostVlad In your lambda functions, are you calling DynamoDB multiple times during a single lambda invocation? If so, you can turn on connection keep-alive so that connections are reused. You can enable keep-alive by passing a custom agent to a service client:
Example:
const DynamoDB = require('aws-sdk/clients/dynamodb');
const https = require('https');
const agent = new https.Agent({
keepAlive: true
});
const dynamodb = new DynamoDB({
httpOptions: {
agent
}
});
There were some issues reported using keep-alive in Lambda with older versions on node.js that looked like a race condition when a connection was reused across multiple lambda invocations. I haven't looked into whether this still happens using 6.x, but you can also call agent.destroy()
before you lambda function ends to make sure there are no connections open to be reused in another invocation.
@chrisradek Thanks for the response! Yes, invoking DynamoDB performs multiple times in single lambda invocation. In fact, there are a lot of sequential serial update queries for DynamoDB in that process.
For detailed research aws-sdk
and supposed solution, LocalStack
DynamoDB emulator was used on local machine. Following code is used for create dynamodb
and docClient
objects:
const agent = new http.Agent({ keepAlive: true })
const dynamodb = new AWS.DynamoDB({ httpOptions: { agent } })
const docClient = new AWS.DynamoDB.DocumentClient({ httpOptions: { agent } })
After, for research purposes, Wireshark
had been launched to trace, which HTTP requests actually are performed. Of course, localstack
is not the complete local replacement for Dynamodb
, but redundant HTTP queries are initialized by aws-sdk
library, and localstack
cannot affect it since it only accepts requests.
So, the sample contains about 80000 update queries with about 100 byte-sized document difference per update operation. Result measurement was terrible: even with using custom keep-alived
agent, aws-sdk
perform creating tons of new http responses.
After measuring with Wireshark
it's seen, that about 800 MB traffic was exchanged between DynamoDB localstack and application server. Although stored data volume as about 80k entries x ~100 byte = 8 MB, traffic was in HUNDRED times greater.
Maybe something is wrong configured, but current aws-sdk
behavior is unsatisfying.
PCAP: https://expirebox.com/download/12768d2cc051c0974d095f90ef46eb19.html
@IhostVlad
Is the docClient being re-used? Another issue was opened recently where that was not the case and they had similar performance issues: https://github.com/aws/aws-sdk-js/issues/2276
@srchase Thanks for response! Yes, docClient
is being reused. There is simplified code to show issue - table for stories had been created and one million dummy-stories had been exported into Dynamodb. To avoid latency within generation content, all documents are extremely simple
import AWS from 'aws-sdk'
const configDynamo = {
/* AWS config */
rcu: 10,
wcu: 10
}
AWS.config.update(configDynamo)
const agent = new http.Agent({ keepAlive: true })
const dynamodb = new AWS.DynamoDB({ httpOptions: { agent } })
const docClient = new AWS.DynamoDB.DocumentClient({ httpOptions: { agent } })
const checkTableExists = async (tableName) => {
try {
const tableInfo = await dynamodb
.describeTable({ TableName: tableName })
.promise()
const tableStatus = tableInfo.Table.TableStatus
if (tableStatus === 'ACTIVE') return true
return await checkTableExists(dynamodb, tableName)
} catch (err) {}
return false
}
const InitTable = async () => {
if (await checkTableExists(dynamodb, 'stories')) {
await dynamodb.deleteTable({ TableName: 'stories' }).promise()
while (await checkTableExists(dynamodb, 'stories')) {}
}
await dynamodb
.createTable({
TableName: 'stories',
KeySchema: [{ AttributeName: 'id', KeyType: 'HASH' }],
AttributeDefinitions: [
{ AttributeName: 'id', AttributeType: 'S' },
{ AttributeName: 'ratingSurrogate', AttributeType: 'S' },
{ AttributeName: 'rating', AttributeType: 'N' },
{ AttributeName: 'content', AttributeType: 'S' }
],
ProvisionedThroughput: {
ReadCapacityUnits: configDynamo.rcu,
WriteCapacityUnits: configDynamo.wcu
},
GlobalSecondaryIndexes: [
{
IndexName: 'rating_idx',
KeySchema: [
{ AttributeName: 'ratingSurrogate', KeyType: 'HASH' },
{ AttributeName: 'rating', KeyType: 'RANGE' }
],
Projection: {
ProjectionType: 'INCLUDE',
NonKeyAttributes: ['id', 'rating', 'content']
},
ProvisionedThroughput: {
ReadCapacityUnits: configDynamo.rcu,
WriteCapacityUnits: configDynamo.wcu
}
} ]
})
.promise()
while (!await checkTableExists(dynamodb, 'stories')) {}
}
const CreateStory = async ({ id, timestamp, payload }) =>
await docClient
.put({
TableName: 'stories',
Item: {
id: aggregateId,
ratingSurrogate: 'SURROGATE',
rating: timestamp,
content
}
})
.promise()
(async () => {
await InitTable()
for(let idx=0; idx<10000000; idx++) {
await CreateStory({
id: `id${idx}`,
rating: idx % 100,
content: 'Test content'
})
}
})()
@IhostVlad
Did you compare against batchWriteItem?
@srchase Thanks for response! Of cource, using batch write function instead multiple single writes will be some times faster, buf unfortunely code above is extremely simplified example. In real application there is much more different CRUD operations, multiple inserts/updates, content of which is refer on previous find calls and external environment. When using SQL-based RDBMS there is no problems since one TCP connection is established and series of CRUD operations are addressed to SQL server. But in Dynamodb there is no stable connection, and HTTPS connection had been disconnected ever time after SINGLE database operation, hovewer Keep-Alive mode is supposed. So question is how to keep alive connection with endpoint for send multiple series operations, but not predefined batch with same bulk list.
@IhostVlad
Does 8.10 behave the same way? I see that this was originally opened before 8.10 was introduced. I'm curious if the newer runtime makes a difference for you.
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
Given AWS Lambda on NodeJS 6-th version, which performs very simple test-purposes CRUD interaction with DynamoDB table. Measured performance is extremely slow, independent to selected Lambda RAM memory or RCU/WCU units for Dynamodb.
There was performed benchmark, and results are unsatisfied. Event MySQL database in micro container has performance, which is better compare to DynamoDB with some times.
After fast research reason of such behavior was found: DynamoDB performs new HTTP(S) request on each CRUD operation! (https://github.com/aws/aws-sdk-js/blob/master/lib/http/node.js#L25) It's extremely slow, event in TLS connection, which has relatively big establishment time, including key exchanges. Also, it's extremely big overhead for HTTP headers, which some times bigger than CRUD payload.
So question: is there available method to communicate with DynamoDB in persistent connection in NodeJS Labmda? Batch operations is not apropriate solution, since they does not support UPDATE operations.