Open ahivert opened 6 years ago
I'd appreciate this as a PR! Even though the order_by
is a performance hit by default, with a database index on the primary key it shouldn't be too bad, and I'd rather have a slow way of indexing all of my documents than no way (because otherwise I've been running out of memory.)
@ahivert Thank you for the report and the fix. I understand the problem. I can't test myself the performance issue on really big tables, but if you have tested it, you are really welcome to make a PR :)
PR submitted ! :) can talk about it
Note that "yield from" is py3.
I have implemented something similar for django-haystack, let me see if that can be dropped in here.
Problem
When we need to put a lot of documents in index, we need to use
queryset_pagination
meta option to paginate. Django pagination need a sorted queryset withorder_by
(cf doc) otherwise same pk can be present more than once and others missing (like #71).Put
order_by
on queryset will make django paginator call order_by for each page. Callorder_by
on huge queryset (like 10 millions) will lead to a huge perfomance issue.Temporary solution:
We can override
_get_actions
method (fromdjango_elasticsearch_dsl.documents.DocType
) to not use django paginator when a queryset is passed. More over because of the way a database index work, we should first fetch only pks, and then do sub request based on it.Available to make the PR if needed.