awslabs / fhir-works-on-aws-persistence-ddb

A DynamoDB implementation of the FHIR Works on AWS framework, enabling users to complete CRUD operations on FHIR resources
Apache License 2.0
27 stars 22 forks source link

Back the Dynamo DB to Elasticsearch data synchronization with a message queue #119

Closed justinusmenzel closed 3 years ago

justinusmenzel commented 3 years ago

Problem statement

During large batch updates Elasticsearch sometimes can't keep up with updates if a lot of them are done in parallel through the existing mechanism where Dynamo DB write events are being received by a lambda which writes to Elasticsearch in parallel.

Proposed change

The lambda triggered by Dynamo DB write events should just write each event into a SQS message queue instead of trying to sync the data directly to Elasticsearch. (Maybe Dynamo DB can be configured so the write events are being fed into SQS without the need of a lambda) Another lambda picks the messages up from SQS and sends the updates to Elasticsearch. The main point of using a message queue is the decoupling of the Dynamo DB write events from the writes to Elasticsearch which tend to be slower. To accommodate to the slower processing speed of Elasticsearch the lambda needs to throttle the message consumption rate and the message queue needs to be able to retain all unprocessed messages. That number could grow quite large. Eventually though all messages should be processed and every write event sent to Elasticsearch.

nguyen102 commented 3 years ago

Thanks for the feature request. I've added this request to our back log. We'll update this ticket with more details after we've gotten a chance to prioritize this ticket against the other items in our backlog.

rsmayda commented 3 years ago

The queueing functionality already occurs within the stream itself per DynamoDB shard. The parallelization issue you are facing is occurring across different DynamoDB shards. By default there will be 1 lambda for each shard. Meaning if you have 50 shards at most you could have 50 lambda's trying to write to your OpenSearch cluster at a given time.

Some potential approaches:

Putting an SQS between DDB and OpenSearch would run into the same problem, where the lambda would quickly spin up to consume the messages on the SQS, so you would need to potentially use one of the suggestions above too.

maghirardelli commented 3 years ago

Closing due to no response.