geddy / model

Datastore-agnostic ORM in JavaScript
265 stars 55 forks source link

nested-eager regression on id array + limit #212

Closed ben-ng closed 10 years ago

ben-ng commented 10 years ago

@flyswatter found a regression where if you did a query with an array of ids with a limit clause it would return duplicate models.

mde commented 10 years ago

Are you sure it's a regression? I think the current implementation has the same problem.

ben-ng commented 10 years ago

pretty sure it's a real regression because @flyswatter's application tests pass on the current implementation.

mde commented 10 years ago

@flyswatter, could we get a breaking test for this? I wouldn't expect limit to work with our current implementation in the first place.

danfinlay commented 10 years ago

I'll try to do this soon, but I'm in crunch-time-going-live mode, so I don't have much mind for anything at the moment.

mde commented 10 years ago

Okay, I have what I think is a breaking test in the branch: "test includes eager-fetch of hasMany with array of ids." It works in master -- basically when you do a limit/offset with 'includes', it does an initial query without the includes to determine the affected item ids, the re-runs with a simple query against that list with the includes. The implementation is pretty messy, but I wanted to verify that it works in master and not with the nested include.

ben-ng commented 10 years ago

@mde it's so awesome when you have time to hack on this stuff

mde commented 10 years ago

Hopefully there will be more time for this stuff overall going forward. :D

ben-ng commented 10 years ago

I think one problem with the solution used in master is that it only works with query conditions on the root table. Are we still planning to query and/or sort on nested attributes? If so, we might need a more robust subquery solution.

mde commented 10 years ago

Oh, hello, I implemented the pre-query wrong. Let me take another shot at this.

mde commented 10 years ago

Okay, I believe this should be correct https://github.com/geddy/model/commit/07abf7e21c00c8a69d926cfd5f96682f645f85ae DISTINCT ids + includes + filters + limit/offset should get the list of ids we want, then re-run the query with no filters and simple list of ids.

danfinlay commented 10 years ago

I've pulled the latest master and I've got a query/problem that I've described in this gist, will try to write a failing test now.

mde commented 10 years ago

Fix for this in master, d4422f3d71c07f90d03590dd7ad9514533fc72b0 -- can you verify it works?

ben-ng commented 10 years ago

@flyswatter post-mortem report: you got unlucky hitting that bug (and we were lucky that you did).

usually result rows are returned somewhat in order because of how databases are implemented, but as it turns out, unless you specify a top level sort, row order is up in the air. so you had consecutive row ids that for whatever reason weren't adjacent to each other on the disk, and that led to us sending duplicate events for the same top level model, since we were relying on that first column being in order.

Resolved!

mde commented 10 years ago

BOOM