M1ha-Shvn / django-pg-bulk-update

Django manager for performing bulk update operations in PostgreSQL database
BSD 3-Clause "New" or "Revised" License
39 stars 13 forks source link

Python unit tests Upload Python Package Downloads

django-pg-bulk-update

Django extension to update multiple table records with similar (but not equal) conditions in efficient way on PostgreSQL

Requirements

Installation

Install via pip:
pip install django-pg-bulk-update
or via setup.py:
python setup.py install

Usage

You can make queries in 2 ways:

Query functions

There are 4 query helpers in this library. There parameters are unified and described in the section below.

Function parameters

Examples

from django.db import models, F
from django.db.models.functions import Upper
from django_pg_bulk_update import bulk_update, bulk_update_or_create, pdnf_clause
from django_pg_bulk_update.set_functions import BulkValue

# Test model
class TestModel(models.Model):
    name = models.CharField(max_length=50)
    int_field = models.IntegerField()

# Create test data
created = TestModel.objects.pg_bulk_create([
    {'id': i, 'name': "item%d" % i, 'int_field': 1} for i in range(1, 4)
])
print(created)
# Outputs 3

# Create test data returning
created = TestModel.objects.pg_bulk_create([
    {'id': i, 'name': "item%d" % i, 'int_field': 1} for i in range(4, 6)
], returning='*')
print(created)
print(type(res), list(res.values_list('id', 'name', 'int_field')))
# Outputs: 
# <class 'django_pg_returning.queryset.ReturningQuerySet'>
# [
#    (4, "item4", 1),
#    (5, "item5", 1)
# ]

# Update by id field
updated = bulk_update(TestModel, [{
    "id": 1,
    "name": "updated1",
}, {
    "id": 2,
    "name": "updated2"
}])

print(updated)
# Outputs: 2

# Update returning
res = bulk_update(TestModel, [{
    "id": 1,
    "name": "updated1",
}, {
    "id": 2,
    "name": "updated2"
}], returning=('id', 'name', 'int_field'))

print(type(res), list(res.values_list('id', 'name', 'int_field')))
# Outputs: 
# <class 'django_pg_returning.queryset.ReturningQuerySet'>
# [
#    (1, "updated1", 1),
#    (2, "updated2", 1)
# ]

# Call update by name field
updated = bulk_update(TestModel, {
    "updated1": {
        "int_field": 2
    },
    "updated2": {
        "int_field": 3
    }
}, key_fields="name")

print(updated)
# Outputs: 2

print(list(TestModel.objects.all().order_by("id").values("id", "name", "int_field")))
# Outputs: [
#     {"id": 1, "name": "updated1", "int_field": 2},
#     {"id": 2, "name": "updated2", "int_field": 3},
#     {"id": 3, "name": "item3", "int_field": 1}
# ]

# Increment int_field by 3 and transform name to upper case for records where id >= 2 and int_field < 3
updated = bulk_update(TestModel, {
        (2, 3): {
            "int_field": 3
        }
    }, key_fields=['id', 'int_field'], key_fields_ops={'int_field': '<', 'id': 'gte'},
    set_functions={'int_field': '+', 'name': Upper('name')})

print(updated)
# Outputs: 1

print(list(TestModel.objects.all().order_by("id").values("id", "name", "int_field")))
# Outputs: [
#     {"id": 1, "name": "updated1", "int_field": 2},
#     {"id": 2, "name": "updated2", "int_field": 3},
#     {"id": 3, "name": "incr", "int_field": 4}
# ]

res = bulk_update_or_create(TestModel, [{
    "id": 3,
    "name": "_concat1",
    "int_field": 3
}, {
    "id": 4,
    "name": "concat2",
    'int_field': 4
}], set_functions={'name': '||', 'int_field': F('int_field') + BulkValue()})

print(res)
# Outputs: 2

print(list(TestModel.objects.all().order_by("id").values("id", "name", "int_field")))
# Note: IntegerField defaults to 0 in create operations. So 0 + 4 = 4 for id 4.
# Outputs: [
#     {"id": 1, "name": "updated1", "int_field": 2},
#     {"id": 2, "name": "updated2", "int_field": 3},
#     {"id": 3, "name": "incr_concat1", "int_field": 7},
#     {"id": 4, "name": "concat2", "int_field": 4},
# ]

# Find records where 
# id IN [1, 2, 3] AND name = 'updated2' OR id IN [3, 4, 5] AND name = 'concat2' OR id IN [2, 3, 4] AND name = 'updated1'
cond = pdnf_clause(['id', 'name'], [([1, 2, 3], 'updated2'),
                                    ([3, 4, 5], 'concat2'),
                                    ([2, 3, 4], 'updated1')], key_fields_ops={'id': 'in'})
data = TestModel.objects.filter(cond).order_by('int_field').values_list('int_field', flat=True)
print(list(data))
# Outputs: [3, 5]

Using custom manager and query set

In order to simplify using bulk_create, bulk_update and bulk_update_or_create functions, you can use a custom manager.
It automatically fills:

Note: As django 2.2 introduced bulk_update method, library methods were renamed to pg_bulk_create, pg_bulk_update and pg_bulk_update_or_create respectively.

Example:

from django.db import models
from django_pg_bulk_update.manager import BulkUpdateManager

# Test model
class TestModel(models.Model):
    objects = BulkUpdateManager()

    name = models.CharField(max_length=50)
    int_field = models.IntegerField()

# Now you can use functions like:
TestModel.objects.pg_bulk_create([
    # Any data here
], set_functions=None)

TestModel.objects.pg_bulk_update([
    # Any data here
], key_fields='id', set_functions=None, key_fields_ops=())

# Update only records with id greater than 5 
TestModel.objects.filter(id__gte=5).pg_bulk_update([
    # Any data here
], key_fields='id', set_functions=None, key_fields_ops=())

TestModel.objects.pg_bulk_update_or_create([
    # Any data here
], key_fields='id', set_functions=None, update=True)           

If you already have a custom manager, you can replace QuerySet to BulkUpdateQuerySet:

from django.db import models
from django.db.models.manager import BaseManager
from django_pg_bulk_update.manager import BulkUpdateQuerySet

class CustomManager(BaseManager.from_queryset(BulkUpdateQuerySet)):
    pass

# Test model
class TestModel(models.Model):
    objects = CustomManager()

    name = models.CharField(max_length=50)
    int_field = models.IntegerField()

If you already have a custom QuerySet, you can inherit it from BulkUpdateMixin:

from django.db import models
from django.db.models.manager import BaseManager
from django_pg_bulk_update.manager import BulkUpdateMixin

class CustomQuerySet(BulkUpdateMixin, models.QuerySet):
    pass

class CustomManager(BaseManager.from_queryset(CustomQuerySet)):
    pass

# Test model
class TestModel(models.Model):
    objects = CustomManager()

    name = models.CharField(max_length=50)
    int_field = models.IntegerField()

Custom clause operator

You can define your own clause operator, creating AbstractClauseOperator subclass and implementing:

Optionally, you can change def format_field_value(self, field, val, connection, cast_type=True, **kwargs) method, which formats value according to field rules

Example:

from django_pg_bulk_update import bulk_update
from django_pg_bulk_update.clause_operators import AbstractClauseOperator

class LTClauseOperator(AbstractClauseOperator):
    names = {'lt', '<'}

    def get_django_filter(self, name):  # type: (str) -> str
        """
        This method should return parameter name to use in django QuerySet.fillter() kwargs
        :param name: Name of parameter
        :return: String with filter
        """
        return '%s__lt' % name

    def get_sql_operator(self):  # type: () -> str
        """
        If get_sql operator is simple binary operator like "field <op> val", this functions returns operator
        :return: str
        """
        return '<'

# Usage examples
# import you function here before calling an update
bulk_update(TestModel, [], key_field_ops={'int_field': 'lt'})
bulk_update(TestModel, [], key_field_ops={'int_field': LTClauseOperator()})

You can use class instance directly in key_field_ops parameter or use its aliases from names attribute.
When update function is called, it searches for all imported AbstractClauseOperator subclasses and takes first class which contains alias in names attribute.

Custom set function

You can define your own set function, creating AbstractSetFunction subclass and implementing:

Optionally, you can change:

Example:

from django_pg_bulk_update import bulk_update
from django_pg_bulk_update.set_functions import AbstractSetFunction

class CustomSetFunction(AbstractSetFunction):
    # Set function alias names
    names = {'func_alias_name'}

    # Names of django field classes, this function supports. You can set None (default) to support any field.
    supported_field_classes = {'IntegerField', 'FloatField', 'AutoField', 'BigAutoField'}

    def get_sql_value(self, field, val, connection, val_as_param=True, with_table=False, for_update=True, **kwargs):
        """
        Returns value sql to set into field and parameters for query execution
        This method is called from get_sql() by default.
        :param field: Django field to take format from
        :param val: Value to format
        :param connection: Connection used to update data
        :param val_as_param: If flag is not set, value should be converted to string and inserted into query directly.
            Otherwise a placeholder and query parameter will be used
        :param with_table: If flag is set, column name in sql is prefixed by table name
        :param for_update: If flag is set, returns update sql. Otherwise - insert SQL
        :param kwargs: Additional arguments, if needed
        :return: A tuple: sql, replacing value in update and a tuple of parameters to pass to cursor
        """
        # If operation is incremental, it should be ready to get NULL in database
        null_default, null_default_params = self._parse_null_default(field, connection, **kwargs)

        # Your function/operator should be defined here
        tpl = 'COALESCE("%s", %s) + %s'

        if val_as_param:
            sql, params = self.format_field_value(field, val, connection)
            return tpl % (field.column, null_default, sql), null_default_params + params
        else:
            return tpl % (field.column, null_default, str(val)), null_default_params

# Usage examples
# import you function here before calling an update
bulk_update(TestModel, [], set_functions={'int_field': 'func_alias_name'})
bulk_update(TestModel, [], set_functions={'int_field': CustomSetFunction()})

You can use class instance directly in set_functions parameter or use its aliases from names attribute.
When update function is called, it searches for all imported AbstractSetFunction subclasses and takes first class which contains alias in names attribute.

Compatibility

Library supports django.contrib.postgres.fields:

Note that ArrayField and HStoreField are available since django 1.8, JSONField - since django 1.9.
RangeField supports are available since PostgreSQL 9.2, psycopg2 since 2.5 and django since 1.8.
PostgreSQL before 9.4 doesn't support jsonb, and so - JSONField.
PostgreSQL 9.4 supports JSONB, but doesn't support concatenation operator (||). In order to support this set function a special function for postgres 9.4 was written. Add a migration to create it:

from django.db import migrations,
from django_pg_bulk_update.compatibility import Postgres94MergeJSONBMigration

class Migration(migrations.Migration):
    dependencies = []

    operations = [
        Postgres94MergeJSONBMigration()
    ]

PostgreSQL before 9.5 doesn't support INSERT ... ON CONFLICT statement. So 3-query transactional update will be used.

Performance

Test background:

Development

This is an Open source project developed by M1ha-Shvn under BSD 3 license. Feel free to create issues and make pull requests.
Library test system is based on django.test. You can find them in tests directory.

Tests requirements

Running tests

Running in docker

  1. Install docker and docker-compose
  2. Run docker build . --tag django-pg-bulk-pupdate in project directory
  3. Run docker-compose run run_tests in project directory

Running in virtual environment

  1. Install all requirements listed above
  2. Create virtual environment
  3. Create a superuser named 'test' on your local Postgres instance:
    CREATE ROLE test;
    ALTER ROLE test WITH SUPERUSER;
    ALTER ROLE test WITH LOGIN;
    ALTER ROLE test PASSWORD 'test';
    CREATE DATABASE test OWNER test;
    CREATE DATABASE test2 OWNER test;
  4. Install requirements
    pip3 install -U -r requirements-test.txt
  5. Start tests
    python3 runtests.py

Alternatives

django 2.2+ bulk_update difference

Pros:

Cons:

django-bulk-update difference

Pros:

Cons: