Closed nbulaj closed 4 months ago
Yeah, it seems suspicious.
The only explanation that comes into my mind is that a page of 5000 items exceeds a 1Mb limit and 5000 items limit just isn't applied:
Limit The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off. Also, if the processed dataset size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation. For more information, see Query and Scan in the Amazon DynamoDB Developer Guide.
Could you evaluate size of each fetched item?
To assess approximately item size the following rule should be used - https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/CapacityUnitCalculations.html. I assume we can just serialize item attributes into a JSON document for simplicity:
size = 0
Model.where(group_id: group.id).batch(5000).each |model|
size += JSON.dump(model.attributes).size
end
Another way is to rely on ConsumedCapacity in response to get how many units were used so we can see if the 1M limit is reached (for every page/Query request).
Thanks @andrykonchin . Yeah, overall size is 18969308
(~19 mb). Let me evaluate more on that.. 1000 items take 1-1.5 megabytes
BTW is it possible to perform a BatchGetItem
somehow using Dynamoid? :crossed_fingers: I see it has same-named class, but not sure if it's possible to use it on a model level. Also I see it requires IDs :thinking:
Yeah, BatchGetItem
requires primary ids to be specified (documentation). It's used in the .find
method when several ids are passed. There is a limit of 100 items per call so it's just a variation of GetItem
.
OK at the end I really think we're just going beyond the limits so it has nothing to Dynamoid. Thanks Andrii :bow: :ukraine:
Hey :wave:
I have 14k records in Dynamo. Loading in batches of 1000 records doing 14 requests which take really big amount of time. I've tried to increase batch size to be 5000, but I don't see any effect. Am I doing it incorrectly?
Produces:
Even batching with
.record_limit(5000).batch(5000)
is doing more requests then I expect :thinking: