beancount / beangulp

Importers framework for Beancount
GNU General Public License v2.0
61 stars 24 forks source link

csvbase calculates balance wrong with multiple sameday transactions #102

Open floriskruisselbrink opened 2 years ago

floriskruisselbrink commented 2 years ago

The new csvbase importer has the option to automatically insert a balance assertion based on a balance column in the input data.

Right now it adds a balance assertion one day after the last transaction, but if there are multiple transactions on that day it incorrectly picks the wrong transaction to take the balance from.

With this input (let’s call it test.csv)

date,payee,narration,amount,balance
01-01-2022,Shop,This is just an expense,-24.85,124.85
01-01-2022,Shop,Some other expense,-15.00,109.85
01-01-2022,Employer,Finally got my paycheck,450.00,409.85

And this configuration (config.py)

import beangulp
from beangulp.importers import csvbase

class Importer(csvbase.Importer):
    date = csvbase.Date('date', '%d-%m-%Y')
    payee = csvbase.Column('payee')
    narration = csvbase.Column('narration')
    amount = csvbase.Amount('amount')
    balance = csvbase.Amount('balance')

    def __init__(self, account, currency, flag='*'):
        super().__init__(account, currency, flag)

    def identify(self, filepath: str) -> bool:
        return True

CONFIG = [
    Importer(account='Assets:Current:SNS', currency='EUR'),
]

if __name__ == '__main__':
    ingest = beangulp.Ingest(CONFIG)
    ingest()

The result of executing python3 config.py extract test.csv is:

;; -*- mode: beancount -*-

**** /Users/floris/Documents/Boekhouding/beancount/csv-balance/test.csv

2022-01-01 * "Shop" "This is just an expense"
  Assets:Current:SNS  -24.85 EUR

2022-01-01 * "Shop" "Some other expense"
  Assets:Current:SNS  -15.00 EUR

2022-01-01 * "Employer" "Finally got my paycheck"
  Assets:Current:SNS  450.00 EUR

2022-01-02 balance Assets:Current:SNS                              124.85 EUR

The last line is incorrect and should read

2022-01-02 balance Assets:Current:SNS                              409.85 EUR

(There is another issue, my bank actually adds the balance before the current transaction in the csv, but I can work around that using a custom column definition that combines the amount and the balance columns to come up with the balance after. For the bug here this is not relevant)

floriskruisselbrink commented 2 years ago

I have a solution that works for me (find transaction used for balance not only by highest date, but also by highest lineno), but this obviously woudn’t work if the csv file is sorted backwards.

diff --git a/beangulp/importers/csvbase.py b/beangulp/importers/csvbase.py
index 38213af..b02e27a 100644
--- a/beangulp/importers/csvbase.py
+++ b/beangulp/importers/csvbase.py
@@ -329,7 +329,7 @@ class Importer(beangulp.Importer, CSVReader):

         # Append balances.
         for currency, balances in balances.items():
-            entries.append(max(balances, key=lambda x: x.date))
+            entries.append(max(balances, key=lambda x: (x.date, x.meta['lineno'])))

         return entries
dnicolodi commented 2 years ago

Thank you for the bug report. It is a known limitation I haven't found the time to address. As you write, looking at the line number does not work without taking into account also the sorting (ascending vs descending) of the entries in the CSV. Also I plan to get rid of the "lineno" metadata entry, see #101. Auto detecting the sorting breaks down where there are transactions spanning only one day. The robust solution involves adding one class parameter to configure whether the sorting is ascending or descending and looking at the last or first entry for the balance.