Netflix / mantis

A platform that makes it easy for developers to build realtime, cost-effective, operations-focused applications
Apache License 2.0
1.42k stars 202 forks source link

Add Pagination to DynamoDB Query Calls #697

Closed kmg-stripe closed 3 months ago

kmg-stripe commented 3 months ago

Also setting explicit pagination limit.

This should fix a pretty big problem:

We have hit a case where the master persistently thinks most jobs are inactive, so every time a new leader is elected most jobs are resubmitted. Over time, new jobs have piled up (100s per cluster), which has made the problem worse, since Dynamo is only returning ~30 partition keys (jobIds from MantisWorkers).

Context

Explain context and other details for this pull request.

Checklist

github-actions[bot] commented 3 months ago

Test Results

537 tests   531 :white_check_mark:  7m 48s :stopwatch: 139 suites    6 :zzz: 139 files      0 :x:

Results for commit c491e86f.

:recycle: This comment has been updated with latest results.

crioux-stripe commented 3 months ago

@Andyz26 We're hoping to get this one and https://github.com/Netflix/mantis/pull/696 out today in order to meet a prod readiness deadline of tomorrow. We're close!

kmg-stripe commented 3 months ago

@crioux-stripe @Andyz26 is out-of-office. I have reviewed the PR and the change makes sense to me. Let me know if you need more help.

@hmitnflx thanks!

One minor comment, @kmg-stripe I found it useful to track total time spent getting all records in paginated results. Also, if there's a way to specify page-size to dynamodb, it'd be great.

The change looks good otherwise. 🍀 for prod readiness

ack! i'll add that in a follow-on PR, since we want to verify this fixes the issue we are seeing.