adamcharnock / django-hordak

Double entry accounting in Django
http://django-hordak.readthedocs.io
MIT License
231 stars 55 forks source link

How to fetch transaction amounts for a set of transactions without causing N+1 selects/queries #46

Closed jordanmkoncz closed 4 years ago

jordanmkoncz commented 4 years ago

I'm using django-hordak in a project where I have a REST API that needs to return a list of Transactions. I'm using DRF and have a view which determines the list of Transactions to return, and then uses a TransactionSerializer that I've created to serialize the transactions. In my TransactionSerializer, I need to include the amount and currency for that Transaction.

I have implemented something that works (based on what's done in hordak.admin.TransactionAdmin.total_amount), but it also causes an N+1 selects problem, where for every Transaction being serialized, it has to perform a separate query to determine the total amount and currency for that Transaction.

Does anyone have any suggestions for how I could resolve this? I'm not sure whether this is just an inherent problem due to the fact that django-hordak does not store in the database anything at the level of a Transaction regarding the total amount for that Transaction, therefore it has to be dynamically calculated based on the Legs.

Below is the serializer I'm currently using:

from django.db.models import Sum
from hordak.models import Transaction
from rest_framework import serializers

class TransactionSerializer(serializers.ModelSerializer):
    amount = serializers.SerializerMethodField()

    amount_currency = serializers.SerializerMethodField()

    class Meta:
        model = Transaction

        fields = [
            "uuid",
            "timestamp",
            "amount",
            "amount_currency",
        ]

    def get_amount(self, obj):
        """
        Copied from hordak.admin.TransactionAdmin.total_amount.
        """
        return str(obj.legs.debits().aggregate(Sum("amount"))["amount__sum"])

    def get_amount_currency(self, obj):
        debit_amount = obj.legs.debits().first().amount

        return str(debit_amount.currency)
adamcharnock commented 4 years ago

Hi @jordanmkoncz,

It's about my bedtime here, but thought I'd just see if I could clarify this in my own head first. Can I summarise the problem as:

  1. I have a list/queryset of transactions
  2. I want to get the total value of each transaction

In your current implementation this requires one query for no. 1, and N queries for no.2 (where N is the number of transactions). Ideally we'd be able to do no. 2 in a single query.


My gut feeling is that this sounds like the kind of time I'd drop down into raw SQL. Something like:

SELECT 
    SUM('amount') as total,
    transaction_id,
FROM 
    legs
WHERE 
   transaction_id IN (...)
   -- Use 'amount > 0' to just get the debits
   AND amount > 0
GROUP BY transaction_id

If you're doing this for less than 100 (maybe 1000?) transactions, I'd just replace ... with the list of IDs you've already fetched from the Django ORM. Otherwise you probably need to do it as a sub-select, which probably means you'll need to translate the above into an ORM query (which now I've written it, looks doable).

Bear in mind it's 2am here, I may have missed the point or done something weird without noticing :-)

jordanmkoncz commented 4 years ago

@adamcharnock thanks for your help. You're exactly right in your summary, and that I'm trying to get this down to 1 or 2 queries instead of N queries (where N is the number of transactions).

This is a paginated API endpoint so it should always be dealing with around 100 transactions at a time. Based on the SQL you suggested I've managed to create a working query, I'll include the code below in case anyone else is wanting to implement similar functionality in their project.

query = """
SELECT
    transaction_id,
    SUM("hordak_leg"."amount") as total
FROM
    "hordak_leg"
WHERE
   "hordak_leg"."transaction_id" IN %(transaction_ids)s
   -- Use 'amount > 0' to just get the debits.
   AND "hordak_leg"."amount" > 0
GROUP BY transaction_id
"""

transaction_ids = [1, 2, 3]

with connection.cursor() as cursor:
    cursor.execute(query, {"transaction_ids": tuple(transaction_ids)})
    results = cursor.fetchall()

I'm aware that this was not exactly a django-hordak specific issue and that it was more about how to make the right query here, but I appreciate you helping to find a solution. I think if we could figure out a way to perform a query that achieves a similar result using the ORM instead of performing a raw SQL query that would be a nice improvement, but I'm not quite sure how the ORM query would work.

adamcharnock commented 4 years ago

Hi @jordanmkoncz,

Just having a look at this again now. I think the following will do it, if you prefer to use the ORM:

Leg.objects.filter(amount__gt=0).values('transaction_id').annotate(Sum('amount')).order_by()

This generates the following SQL:

SELECT 
"hordak_leg"."transaction_id", 
SUM("hordak_leg"."amount") AS "amount__sum" 
FROM "hordak_leg" 
WHERE "hordak_leg"."amount" > 0
GROUP BY "hordak_leg"."transaction_id"  
LIMIT 21

Which I think is essentially the same thing. I haven't actually tested it with real data though.

jordanmkoncz commented 4 years ago

@adamcharnock nice, here's the ORM query I ended up using, which produces essentially the same result as the raw SQL one above:

transaction_ids = [1, 2, 3]

transaction_totals = (
    Leg.objects.filter(amount__gt=0, transaction_id__in=transaction_ids)
    .values("transaction_id", "amount_currency")
    .annotate(Sum("amount"))
    .order_by()
)

Thanks again for your help!