Bulk insert, bulk update or bulk delete won't trigger an index update

etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.

BSD 3-Clause "New" or "Revised" License

1.21k stars 129 forks source link

Bulk insert, bulk update or bulk delete won't trigger an index update #114

Closed thedrow closed 8 years ago

thedrow commented 9 years ago

Due to the fact that we're using the post save and pre delete signals which are not raised in bulk operations a bulk operation (See https://github.com/etianen/django-watson/blob/master/src/watson/registration.py#L382) will not trigger an index update. First of all that should be mentioned in the documentation. Second of all we need to figure out if we can work around this limitation.

etianen commented 9 years ago

Yes, it should be mentioned in the docs.

The only way to work around the limitation at the moment is to manually run buildwatson on a cron job, but it's not very efficient.

carltongibson commented 9 years ago

Presumably you know the IDs of the new objects. You can fetch those and call the signal by hand...

...
post_save.send(MyModel, instance=instance, created=True)
...

thedrow commented 9 years ago

But since Django doesn't return the ids of the newly created objects you can't.

carltongibson commented 9 years ago

Ah. Perhaps not by ID then... but if you can't fetch them, then you can't do it by hand.

etianen commented 9 years ago

Exactly this. A downside of bulk create is that you don’t get the IDs back. It’s intended for efficient data-stuffing.

You could retrieve the IDs for a bulk update (albeit with a race condition), but that would break the performance gain of bulk update in the first place.

I’d generally recommend not doing bulk operations on watson-indexed tables. If you really need performance, and you don’t need to search across multiple tables, using a native postgres full-text index would be faster and easier in the long-run.

On 15 Jun 2015, at 10:22, Omer Katz notifications@github.com wrote:

But since Django doesn't return the ids of the newly created objects you can't.

— Reply to this email directly or view it on GitHub.

thedrow commented 9 years ago

If I had the Ids Django would have a post_bulk_create signal. I discussed that a while ago with some of the core contributors of Django.

alorence commented 9 years ago

I have to import via CSV file a huge amount of entries in a model (about 8.000 lines). I used django ORM bulk_create() to perform that task efficiently. To update the watson index after that, I wrote:

from django.core.management import call_command
[...]
call_command('buildwatson', args=['App.Model', ])

I didn't test it in the real world since I don't have the final CSV file yet.

Do you think calling the command programatically could be a source of problems ? If not, it may be a workaround for some cases.

etianen commented 9 years ago

Nope, I think your approach is a good one.

On Mon, 2 Nov 2015 at 16:22 Antoine Lorence notifications@github.com wrote:

I have to import via CSV file a huge amount of entries in a model (about 8.000 lines). I used django ORM bulk_create() to perform that task efficiently. To update the watson index after that, I wrote:

from django.core.management import call_command [...] call_command('buildwatson', args=['App.Model', ])

I didn't test it in the real world since I don't have the final CSV file yet.

Do you think calling the command programatically could be a source of problems ? If not, it may be a workaround for some cases.

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/114#issuecomment-153072188 .