cgwire / zou

Zou is the Kitsu API. It allows you to store and manage your production data
https://zou.cg-wire.com
GNU Affero General Public License v3.0
166 stars 103 forks source link

Non-Deterministic Ordering on Pagination #813

Closed concatenyuga closed 3 months ago

concatenyuga commented 3 months ago

Context

Studio name: Yuga Labs Zou version: 0.19.32 Zou installation type: hosted by CGWire,

Describe the bug When making requests to Zou CRUD endpoints and the data has identical updated_at entries some items may be duplicated or skipped across pages.

i.e.

The data response from /data/entities?page=1&limit=100&project_id={project_id} will contain 100 unique entries within the response. The data response from /data/entities?page=2&limit=100&project_id={project_id} will also contain 100 unique entries within the response.

However, between the two responses 3%-5% of the items will be duplicated, so instead of 200 unique items across both queries there is really only ~196 unique items across the queries.

Additionally the total number of items returns accurately matches the total number of expected items, however since there are duplicates some items may never be returned if using pagination.

Expected behavior The expected behavior is that iterating through relevant pages will return each item for the total items.

Additional context I don't know if we introduced a large number of items that have the same updated_at date or that was happened in a migration or something, but a high percentage of items share the same exact updated_at time which makes me think this might have to do with order by not being deterministic if the values are identical.

I don't think I will be able to allocate the resources to test this completely, but I wonder if adding the model id as an additional sort column would provide deterministic in cases where the updated_at column is identical.

current:

https://github.com/cgwire/zou/blob/cc1250b7e49832e0b328948a45f2a1334db831b6/zou/app/blueprints/crud/base.py#L38

possible fix:

query = query.order_by(self.model.updated_at.desc(), self.model.id)
EvanBldy commented 3 months ago

Hi @concatenyuga, Thanks for the suggestion ! I think it's a good idea, I will try to implement that today :) It will be released tomorrow. Have a nice day

EvanBldy commented 3 months ago

I have made a PR for that : https://github.com/cgwire/zou/pull/812

EvanBldy commented 3 months ago

I close this issue, it will be released tomorrow (and also deployed on your instance), if you have any problems you can reopen it. Thanks again for the suggestion :)