MattBroach / DjangoRestMultipleModels

View (and mixin) for serializing multiple models or querysets in Django Rest Framework
MIT License
549 stars 67 forks source link

Paged results, but querying all items anyways? #9

Closed LookLikeAPro closed 8 years ago

LookLikeAPro commented 8 years ago

Longer lists of results take longer, despite pages being the same size (20 items).

MattBroach commented 8 years ago

It looks like your correct -- thank you for catching the problem. Unfortunately, as it stands now, the mixin/view can't rely on lazy queryset evaluate to do pagination, since you have to iterate through the entire set of data in order to construct the flat list . Otherwise, for example, you'd have no ability to sort across the different querysets in the queryList when the sort field is enabled.

I'll think about whether there's a refactor that can rely on lazy queryset evaulation for pagination, but for now I'd say that pagination is more a UX convenience than it is a processing saver.

LookLikeAPro commented 8 years ago

@MattBroach Thanks for looking into this. I don't know if it is helpful but I ended up using this approach.

MattBroach commented 8 years ago

I know this is from awhile back, but I've spent some time thinking about this, and I'm just not sure you can merge and order separate querysets, even with the isInstance trick shown in that approach, without evaluating the two querysets, which will hit the database for all objects. For now, I've put a note in the documentation that pagination with the MultipleModelAPIView is really more for convenience purposes than it is for database efficiency. I'll keep it in the back of my head, though, and if I think of a way of doing it I'll let you know.

JamshedVesuna commented 7 years ago

This should be possible with something like CursorPagination. You could add an additional opaque cursor parameter that encodes which queryset the client is on. This allows you to only fetch the objects you need from the database.

See DRF's CursorPagination.

MattBroach commented 7 years ago

@JamshedVesuna, I will definitely look into it and appreciate the suggestion. The problem has to do with ordering and how to evaluate when comes next -- it would be fairly trivial to pagination within a given queryset, but imagine someone wanted to create a list of Poem objects and Play objects, as in the docs. If there mixed together by say, title, how do we know whether the first 20 titles are all Plays, all Poems, or some admixture of the two, without evaluating the entire queryset? Even if we could answer that for the first twenty, we wouldn't necessarily know how to offset the next set of Poems and Play objects appropriately.

My understanding is that Cursor Pagination also works best when the field is indexed -- which is impossible across tables (which would be the case with different models). None of this is to say it can't be done: I'll definitely look into it and appreciate the suggestion.

JamshedVesuna commented 7 years ago

You're correct - interleaving objects from different models is more complicated. You can either evaluate the entire queryset (like you mentioned) or order each queryset and pick off the top of each queryset (kind of like mergesort) to create a combined response (this is fairly expensive, but does eliminate the need to evaluate the entire queryset).

Of course, this approach assumes that the ordering field(s) used by each model is comparable between models, which is not always the case. For example, if one model orders by a created_at timestamp, and another orders by a title charfield, this would be impossible. In this case, I think the only solution is to return all objects from one model, then the next, etc. like you mentioned above.

I'd imagine that for most users, this approach is sufficient if the caller just wants the complete response and doesn't really care about inter-page ordering. This also only evaluates a page's worth of objects rather than the entire queryset, just like DRF.

MattBroach commented 6 years ago

Hello everyone in this thread: temporarily resurrecting this closed issue, because pagination has now been properly implemented -- the pagination filtering now happens at the query level rather than after the querysets have been evaluated, so large datasets are handled correctly. There are a few caveats to be aware of:

  1. The only pagination that has been implemented so far is LIMIT/OFFSET pagination; others may follow in the future, but this is the only pagination style currently implemented.
  2. The new 2.0 version of DjangoRestMultipleModels went through a substantial change in code and interface: the original view class was split into two different classes (depending on output format), queryList has become querylist (removed the awkward camelCase), and querylist items are now dicts, rather than tuples, and more. I'd recommend reading the 1.x to 2.0 guide in the docs for a full description of what changed and how to update your code.

Let me know if any of you get a chance to play with the new library interface and pagination; would love feedback on the new design.