Open iJohnMaged opened 3 years ago
Thanks for this, and for the tests.
I'm really surprised this works! Performing a count on a large dataset can be very slow, and chunking through a dataset gets slower each chunk due to how databases perform offsetting.
But, pragmatically, for databases that don't support server side cursors, it's better than nothing.
I would suggest the following changes:
(2) is really important. It means this will scale to larger datasets, and it means, for auto-increment pks, no models will be missed if the data is being edited.
I'll fix the build checks errors and implement those changes, you're completely right about those and I ended up implementing that in the view anyway for search!
Running into a similar issue on 100k rows :/ Any chance of merging this John?
I can't merge this without the suggested changes, and broken builds, being fixed. I'm happy to consider another PR, or updates to this one.
@etianen Did you publicize the library mentioned in https://github.com/etianen/django-watson/issues/26#issuecomment-26192741 :) ?
Nope, sorry! And it's likely lost to time now.
On Sun, 21 Aug 2022 at 15:43, Dani Hodovic @.***> wrote:
@etianen https://github.com/etianen Did you publicize the library mentioned in #26 (comment) https://github.com/etianen/django-watson/issues/26#issuecomment-26192741 :) ?
— Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/pull/284#issuecomment-1221559198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCFJRL4XXCBHKNIYIJLV2I6ADANCNFSM5CGGDQLA . You are receiving this because you were mentioned.Message ID: @.***>
Came across your library and wanted to integrate it for a client, great work!
However when I was deploying on a relatively big database (2M rows, big model with lots of text data), the process was always getting killed on PythonAnywhere while using all the CPU and ram available, without creating a single index in
watson_searchentry
.So I tinkered a bit and found that .iterator() is the issue in my case (limited resources, MySQL database too),
buildwatson
doesn't get to create any index, eventually changed the code to slice instead of .iterator and it got through.I add an argument to
buildwatson
called--slice-queryset
to slice it instead of iterate, if that works for others in some cases.