carltongibson / django-filter

A generic system for filtering Django QuerySets based on user selections
https://django-filter.readthedocs.io/en/main/
Other
4.46k stars 770 forks source link

[OrderingFilter] Question / Feature request: Allow to specify arbitrary Django ORM expression for ordering #1021

Open earshinov opened 5 years ago

earshinov commented 5 years ago

Hello! First of all, I'd like to thank you for this project, which proved to be extremely useful in combination with Django REST Framework in our application.

Currently I am struggling with a problem of serializing, filtering, and ordering by fields of related models. In this issue I would like to only focus on ordering.

Let me illustrate my intentions with the following model:

# models.py

from django.contrib.auth.models import User
from django.db import models

class Order(models.Model):
    user = models.ForeignKey(User, on_delete=models.PROTECT, null=False, blank=False)
    created = models.DateTimeField(null=False, blank=False, auto_now_add=True)
    submitted = models.DateTimeField(null=True, blank=True)

    def __str__(self):
        return f'Order #{self.id}'

class Product(models.Model):
    name = models.CharField(null=False, blank=False, max_length=256)
    price = models.DecimalField(null=False, blank=False, decimal_places=2, max_digits=12)

    def __str__(self):
        return self.name

class OrderLine(models.Model):
    product = models.ForeignKey(Product, on_delete=models.PROTECT, null=False, blank=False, related_name='order_lines')
    quantity = models.IntegerField(null=False, blank=False)
    product_price = models.DecimalField(null=False, blank=False, decimal_places=2, max_digits=12)
    total_price = models.DecimalField(null=False, blank=False, decimal_places=2, max_digits=12)
    order = models.ForeignKey(Order, on_delete=models.CASCADE, null=False, blank=False, related_name='order_lines')

    def __str__(self):
        return f'{self.order}: {self.product.name} x{self.quantity}'

Then, let's start with a DRF ViewSet without any filtering and ordering configured:

# views.py

from rest_framework import viewsets
from django_filters.rest_framework import DjangoFilterBackend, FilterSet, OrderingFilter
from .models import Order

class OrderFilterSet(FilterSet):
    pass

class OrderViewSet(viewsets.ModelViewSet):
    queryset = Order.objects.all()
    serializer_class = OrderSerializer  # the definition of OrderSerializer is irrelevant and not shown here
    filter_backends = (DjangoFilterBackend, OrderingFilter)
    filterset_class = OrderFilterSet
    ordering_fields = ()

    class Meta:
        model = Order

My goal in the scope of this issue is to allow the client to order the requested list of orders by these fields:

  1. ordering=[-]created - order by order creation date ([-] indicates an optional - to order desc.)
  2. ordering=[-]user - order by the full name of the user who placed the order
  3. ordering=[-]total_quantity - order by the total quantity of products in the order

The basic approach with ordering_fields configured on the ViewSet only allows for ordering based on model fields, p.1. Fortunately, we can use a more advanced method of subclassing the filters.OrderingFilter as well described in https://django-filter.readthedocs.io/en/master/ref/filters.html#orderingfilter:

# views.py

from rest_framework import viewsets
from django_filters.rest_framework import DjangoFilterBackend, FilterSet, OrderingFilter
from .models import Order

class OrderOrderingFilter(OrderingFilter):

    def __init__(self):
        super().__init__(fields={
            # example: model field
            # Order by order creation date
            'created': 'created',

            # example: expression on related model
            # Order by user full name
            'user': '',  # ???

            # example: aggregate expression
            'total_quantity': ''  # ???
        })

class OrderFilterSet(FilterSet):
    ordering = OrderOrderingFilter()

class OrderViewSet(viewsets.ModelViewSet):
    queryset = Order.objects.all()
    serializer_class = OrderSerializer  # the definition of OrderSerializer is irrelevant and not shown here
    filter_backends = (DjangoFilterBackend,)
    filterset_class = OrderFilterSet

    class Meta:
        model = Order

However, even when using this advanced method, django-filter seems to require a field name. The only way to circumvent this issue is by annotate'ing the filtered QuerySet:

# ... the rest is omitted to brevity ...
class OrderOrderingFilter(OrderingFilter):

    def __init__(self):
        super().__init__(fields={
            # ... the rest is omitted to brevity ...
            'user_order': 'user',
        })

    def filter(self, qs, value):
        if value:
            qs = self._annotate(qs)
        return super().filter(qs, value)

    def _annotate(self, qs, value):
        if 'user' in value or '-user' in value:
            qs = qs.annotate(user_order=functions.Concat(
                models.F('user__first_name'),
                models.Value(' '),
                models.F('user__last_name'),
            ))
        return qs

Some can say that using .annotate here is a terrible idea because one cannot reliably mix .annotate with .filter as described in https://docs.djangoproject.com/en/2.1/topics/db/aggregation/#order-of-annotate-and-filter-clauses. However, you are fine as long as you don't use aggregation in your .annotate calls, so we are good so far. (I will actually return to aggregation later)

Another issue with the latest code sample is that the code is repetitive and is better to be extracted into a base class which could then be conveniently inherited. For now, I have written my own OrderingFilter class, which is a complete alternative to the one provided by django-filter. Its complete source code is below. You may notice that it borrows much from django-filter:

import typing

from django.db import models
from django.db.models import expressions
from django.forms.utils import pretty_name
from django.utils.translation import ugettext_lazy as _

from django_filters import filters
from django_filters.constants import EMPTY_VALUES

__all__ = ('OrderingFilter',)

class OrderingFilter(filters.BaseCSVFilter, filters.ChoiceFilter):
    """
    An alternative to django_filters.filter.OrderingFilter that allows to specify any Django ORM "expression" for ordering.

    Usage examples:

      class MyOrderingFilter(ddf.OrderingFilter):
        def __init__(self):
          super().__init__(fields={

            # a model field
            'id': 'id'

            # an expression
            'published_by':
              functions.Concat(
                expressions.F(f'published_user__first_name'),
                expressions.Value(' '),
                expressions.F(f'published_user__last_name')
              ),

            # a complete field descriptor with custom field label
            'reported_by': {
              'label': 'Reporter',
              'desc_label': 'Reporter (descending)',  # optional, would be derived from 'label' anyway
              'expr': functions.Concat(
                expressions.F(f'reported_user__first_name'),
                expressions.Value(' '),
                expressions.F(f'reported_user__last_name')
              ),
            }
          })

    For more information about expressions, please see the official Django documentation at
    https://docs.djangoproject.com/en/2.1/ref/models/expressions/
    """

    _fields: typing.Mapping[str, 'FieldDescriptor']

    def __init__(self, fields: typing.Mapping[str, typing.Any]):
        self._fields = normalize_fields(fields)
        super().__init__(choices=build_choices(self._fields))

    # @override
    def filter(self, qs: models.QuerySet, value: typing.Union[typing.List[str], None]):
        return qs if value in EMPTY_VALUES else qs.order_by(*(expr for expr in map(self.get_ordering_expr, value)))

    def get_ordering_expr(self, param) -> expressions.Expression:
        descending = param.startswith('-')
        param = param[1:] if descending else param
        field_descriptor = self._fields.get(param)
        return None if field_descriptor is None else \
            field_descriptor.expr if not descending else field_descriptor.expr.desc()

def normalize_fields(fields: typing.Mapping[str, typing.Any]) -> typing.Mapping[str, 'FieldDescriptor']:
    return dict((
        param_name,
        FieldDescriptor(param_name, {'expr': normalize_expr(field)} if isinstance(field, (str, expressions.Expression)) else field)
    ) for param_name, field in fields.items())

def normalize_expr(expr: typing.Union[str, expressions.Expression]):
    return models.F(expr) if isinstance(expr, str) else expr

descending_fmt = _('%s (descending)')

class FieldDescriptor:
    expr: models.Expression

    def __init__(self, param_name: str, data: typing.Mapping[str, typing.Any]):
        self.expr = normalize_expr(data['expr'])
        self.label = data.get('label', _(pretty_name(param_name)))
        self.desc_label = data.get('desc_label', descending_fmt.format(self.label))

def build_choices(fields: typing.Mapping[str, 'FieldDescriptor']):
    choices = []
    for param_name, field_descriptor in fields.items():
        choices.append((param_name, field_descriptor.label))
        choices.append((f'-{param_name}', field_descriptor.desc_label))
    return choices

Using this class, OrdersOrderingFilter becomes nice and concise:

# ... the rest is omitted to brevity ...
class OrderOrderingFilter(ddf.OrderingFilter):

    def __init__(self):
        super().__init__(fields={
            # ... the rest is omitted to brevity ...
            'user': functions.Concat(
                models.F('user__first_name'),
                models.Value(' '),
                models.F('user__last_name'),
            ),
        })

This brings up the question I came here with:

_What do you think of providing the ability to specify ordering with a Django ORM expression in django_filters.rest_framework.OrderingFilter?_

But before you answer, let me remind you that we have only pp. 1 and 2 solved. To solve p.3 we are going to need an aggregation expression:

aggregates.Sum(models.F('order_lines__quantity'))

We cannot use this expression in .annotate, because the results of mixing it with QuserySet filtering will be unpredictable. We cannot use this expression directly in a call to Query.order(...) either, because

django.core.exceptions.FieldError: Using an aggregate in order_by() without also including it in annotate() is not allowed: Sum(F(order_lines__quantity))

So p.3 is out of reach, requiring a solution that will pierce everything starting with the ViewSet. It has more to do with Django REST Framework than django-filters. Do we actually need expression support in django_filters.rest_framework.OrderingFilter without aggregation expression support? This might only confuse django-filter users without bringing much benefit.

I am looking forward to hearing your opinion. I know it is much information to digest. Hopefully, my test project might help: https://github.com/earshinov/django_sample/tree/master/django_sample/ordering_by_expression

carltongibson commented 5 years ago

HI @earshinov. Thanks for the report. Very interesting. Let me have a think about it. Ping me in March if I've not replied by then. 🙂

earshinov commented 5 years ago

Another idea: it is possible to go a step further and allow a filter to specify not one, but multiple expressions or field names.

Before (option 1):

class OrderOrderingFilter(ddf.OrderingFilter):

    def __init__(self):
        super().__init__(fields={
            'user': functions.Concat(
                models.F('user__first_name'),
                models.Value(' '),
                models.F('user__last_name'),
            ),
        })

After (option 2):

class OrderOrderingFilter(ddf.OrderingFilter):

    def __init__(self):
        super().__init__(fields={
            'user': ('user__first_name', 'user__last_name'),
        })

In both cases ordering=user orders data by user's first name first, user's second name second, but option 2 is easier to write, and perhaps more efficient when it comes to DB queries.

earshinov commented 5 years ago

Hello @carltongibson ,

You asked me to ping you in Match. It is May already :)

Returning back to my proposal on ordering by expressions (one or multiple), I really think it is a missing part of the puzzle.

I see four basic table operations: a. serialization b. ~pagination~ (irrelevant to this discussion) c. filtering d. sorting

Using DRF and django-filter it is possible to implement all of these with support for related models, except for sorting:

a. For serialization, we have got nested serializers. Also, if we need to return an aggregated field (think of count), we can .annotate() the queryset:

from django.db.models import aggregates, functions, F, Value

class ModelViewSet(...):

    def get_queryset(self):
        qs = Model.objects.all()

        if self.action == 'list':
            qs = qs.annotate(
                author_full_name=functions.Trim(functions.Concat(
                    F('author__first_name'),
                    Value(' '),
                    F('author__last_name'),
                )),
                submodel_count=aggregates.Count('submodel'))
            )

        return qs

c. For filtering with Django expressions, one can implement a custom filter (see an example below). Filtering by an aggregated field (submodel_count) is possible using the ordinary scalar filters (filters.NumberFilter).

from django_filters import filters
from django_filters.rest_framework import FilterSet

class ModelFilterSet(FilterSet):

    author = UserFilter(field_name='author', label='author', lookup_expr='icontains')

class UserFilter(filters.Filter):
    """A django_filters filter that implements filtering by user's full name.

    def filter(self, qs: QuerySet, value: str) -> QuerySet:
        # first_name <lookup_expr> <value> OR last_name <lookup_expr> <value>
        return qs if not value else qs.filter(
            Q(**{f'{self.field_name}__first_name__{self.lookup_expr}': value}) |
            Q(**{f'{self.field_name}__last_name__{self.lookup_expr}': value})
        )

d. There is no solution for sorting :-(

earshinov commented 5 years ago

Here is the complete implemention of our custom OrderingFilter, which we have been using so far:

class OrderingFilter(filters.BaseCSVFilter, filters.ChoiceFilter):
    """An alternative to :class:`django_filters.filters.OrderingFilter` that allows to specify any Django ORM expression for ordering.

    Usage example:

    .. code-block:: python

      from django.db import models
      from django.db.models import aggregates, expressions, fields

      import ddl

      class OrderOrderingFilter(ddl.OrderingFilter):
        def __init__(self):
          super().__init__(fields={

            # a model field
            'created': 'created'

            # an expression
            'submitted': expressions.ExpressionWrapper(
                models.Q(submitted_date__isnull=False),
                output_field=fields.BooleanField()
            ),

            # multiple fields or expressions
            'user': ('user__first_name', 'user__last_name'),

            # a complete field descriptor with custom field label
            'products': {
              'label': 'Total number of items in the order',
              # if not specified, `desc_label` would be derived from 'label' anyway
              'desc_label': 'Total number of items in the order (descending)',
              'expr': aggregates.Sum('order_lines__quantity'),
              # it is also possible to filter by multiple fields or expressions here
              #'exprs': (...)
            },
          })

    For more information about expressions, see the official Django documentation at
    https://docs.djangoproject.com/en/dev/ref/models/expressions/
    """

    _fields: typing.Mapping[str, 'FieldDescriptor']

    def __init__(self, fields: typing.Mapping[str, typing.Any]):
        self._fields = normalize_fields(fields)
        super().__init__(choices=build_choices(self._fields))

    # @override
    def filter(self, qs: models.QuerySet, value: typing.Union[typing.List[str], None]):
        return qs if value in EMPTY_VALUES else qs.order_by(*(itertools.chain(*(self.__get_ordering_exprs(param) for param in value))))

    def __get_ordering_exprs(self, param) -> typing.Union[None, typing.List[expressions.Expression]]:
        descending = param.startswith('-')
        param = param[1:] if descending else param
        field_descriptor = self._fields.get(param)
        return () if field_descriptor is None else \
            field_descriptor.exprs if not descending else \
            (expr.desc() for expr in field_descriptor.exprs)

def normalize_fields(fields: typing.Mapping[str, typing.Any]) -> typing.Mapping[str, 'FieldDescriptor']:
    return dict((
        param_name,
        FieldDescriptor(param_name, field if isinstance(field, collections.Mapping) else {'exprs': normalize_exprs(field)})
    ) for param_name, field in fields.items())

def normalize_exprs(exprs: typing.Union[
        typing.Union[str, expressions.Expression],
        typing.List[typing.Union[str, expressions.Expression]]
    ]) -> typing.List[expressions.Expression]:
    # `exprs` is either a single expression or a Sequence of expressions
    exprs = exprs if isinstance(exprs, collections.Sequence) and not isinstance(exprs, str) else (exprs,)
    return [normalize_expr(expr) for expr in exprs]

def normalize_expr(expr: typing.Union[str, expressions.Expression]) -> expressions.Expression:
    return models.F(expr) if isinstance(expr, str) else expr

descending_fmt = _('%s (descending)')

class FieldDescriptor:
    exprs: typing.List[models.Expression]

    def __init__(self, param_name: str, data: typing.Mapping[str, typing.Any]):
        exprs = data.get('exprs') or data.get('expr')
        if not exprs:
            raise ValueError("Expected 'exprs' or 'expr'")
        self.exprs = normalize_exprs(exprs)
        self.label = data.get('label', _(pretty_name(param_name)))
        self.desc_label = data.get('desc_label', descending_fmt.format(self.label))

def build_choices(fields: typing.Mapping[str, 'FieldDescriptor']):
    choices = []
    for param_name, field_descriptor in fields.items():
        choices.append((param_name, field_descriptor.label))
        choices.append((f'-{param_name}', field_descriptor.desc_label))
    return choices

Update: Included the implemention of normalize_fields and other helper methods.

carltongibson commented 5 years ago

Good. 😀

Thanks for the ping.

Tbh I haven’t had a moment to think about this.

In all your great thought here do you have a suggestion for a smallish change that we might make? (Maybe it’s just easier to push forward with a PR).

earshinov commented 5 years ago

Yeah, it is not immediately obvious how to make such changes backward compatible, I will have to give it a think. Do not expect a PR soon (will be on vacation in places where Internet is scarce).

carltongibson commented 5 years ago

That’s fine. 🙂

There’s no rush here. Better we’ll thought through, if at all.

earshinov commented 5 years ago

@carltongibson , Okay, we contemplated a little bit, but could not find an acceptable way of combining the old and new implementation of OrderingFilter.

If we are going to put everything into one class, the biggest problem is that in the old implementation the fields dict stores model_field-parameter_name pairs, whereas in the new implementation it is the opposite parameter_name-model_field (it needs to be, because instead of model_field an expression can be passed, which can only be stored in a value, but not in a key in a dictionary).

Technically, it is possible to overcome this problem by interpreting the dictionary "in the old way" if it contains only strings, and "in the new way" otherwise. In this case the user will have to "flip" dictionary entries when a need in ordering by expression arises. And be careful to "flip" the dictionary entries back, should the ordering expression be removed... This sounds like a terrible user experience to me. It will also make the implementation convoluted.

Do you think it is possible to introduce the new OrderingFilter along with the old one under a different name, like ExpressionOrderingFilter?

gabn88 commented 5 years ago

This is great, exactly at the moment I needed it! Thanks @earshinov!

How would I do something like:

MyModel.objects.all().order_by(F('price').desc(nulls_last=True))

With your ordering filter?

    o = ExpressionOrderingFilter(

        fields={
                'price': F('price').desc(nulls_last=True)
        }

    )

seems to work great! Now the only thing left is figuring out how to combine mutliple filters, ak. 'price' and 'stock' into one.

rpkilby commented 5 years ago

ach - started responding to this but my computer went kaput.

I think the proposed changes here are sensible. The model_field-parameter_name pair made sense at the time since we typically derive the exposed params/forms/etc from the model, but there's no reason why this is necessary. Swapping the mapping would make sense, and allow us to take advantage of more complex expressions.

Also, I don't think deprecation process would be insanely difficult, and I'd be happy to help throw in on it. Basically, fields and field_labels would transition to params and param_labels. It's easy enough to convert from one to the other, while retaining backwards compatibility, and raising a deprecation notice for users using the old arguments.

One thing to consider is automatic conversion of complex expressions for the descending case (param vs -param). e.g., if the ascending is .asc(nulls_last=True)., should the inverse of this be .desc(nulls_first=True), or should nulls remain last regardless of sort direction?

carltongibson commented 5 years ago

OK, super @rpkilby.

Happy to see us move forwards here. My main concern is that we document properly what we're doing. Users already find the df docs a little terse shall we say 🙂 — Happy to add API but we need to make sure it's clear.

Such a thing doesn't have to be a one-person effort.

earshinov commented 5 years ago

@rpkilby , @carltongibson , If you need my help in integrating my changes into the project, I think I can put in some time. But I need directions. Where and how do I start?

carltongibson commented 5 years ago

Hi @earshinov. I'd start by drafting the docs. What's the story we tell? From there the code changes. A PR, even if just a draft gives us something to talk about.

rpkilby commented 5 years ago

Also, test cases if you have them. They may need to be adjusted to match the final changes, but having the various cases thought through would be very helpful (e.g., converting .asc to .desc, handling nulls_fist / nulls_last arguments, etc..).