aykut / django-bulk-update

Bulk update using one query over Django ORM
MIT License
432 stars 59 forks source link

bulk_update_or_create(model_instances) or bulk_update(model_instances, upsert=True)? #49

Open candeira opened 8 years ago

candeira commented 8 years ago

For my current job we need bulk upsert of records, and I'm thinking of forking your package and implementing bulk_upsert myself. If/when I do that, I'd like to do it in the manner that's most likely to be accepted into your project, so as not to maintain an independent fork.

Which syntax do you prefer?

For now I'd only make my changes compatible with Postgres 9.5+, because that's what we're using and because I'm relatively new at this niche.

Any other advice/comment?

aykut commented 8 years ago

Hi,

I'm not sure it is a good idea to include bulk_create into this project. Django already has built-in bulk_create method. Why not separate the objects into create and update, then use bulk_create and bulk_update explicitly?

ckcollab commented 8 years ago

@candeira I'm way into that! I could use this on my project, for sure.

@aykut bulk_update_or_create is different from bulk_create?

phlax commented 7 years ago

@candeira @aykut @ckcollab this would be amazingly helpful

phlax commented 7 years ago

@aykut the problem with doing bulk_create is that you need to know in advance which ones exist already - so requires an additional query i think

mehdipourfar commented 7 years ago

I need this feature. Any news?

arnau126 commented 7 years ago

I think it's possible to add this feature. I would call it bulk_update_or_create because django already has a update_or_create for single instances.

But even if we implement this function here, we will also need to know which instances already exist (performing an additional query). bulk_update_or_create will actually split the list of instances and call bulk_create and bulk_update separately. So each batch will perform 3 queries.

Seems reasonable for you? Any better approach?

mehdipourfar commented 7 years ago

It's seems reasonable. Although both postgres and mysql now suport bulk upsert: https://stackoverflow.com/questions/34514457/bulk-insert-update-if-on-conflict-bulk-upsert-on-postgres https://stackoverflow.com/questions/6286452/mysql-bulk-insert-or-update

abdulwahid24 commented 7 years ago

I do agree with @arnau126, Any update about this feature.

Bartvds commented 7 years ago

The 3 query approach is a race condition; unless you can be sure your program is the only one writing to that table you'll have to add retry logic around the transaction (as records can get added and removed between your read and create step).

SQL level UPSERT is the way to go for atomic single query update/create.