beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
247 stars 28 forks source link

example use case for beancount standard csv.importer #43

Closed mondjef closed 6 years ago

mondjef commented 6 years ago

I am trying to get this to work with the standard provided csv.importer of beancount without much success. To be honest I am fairly green with Python let alone decorators so I am sure it is something that I am doing or not doing...

import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))

from beancount.ingest import extract
from beancount.ingest.importers import csv

from smart_importer.predict_postings import PredictPostings

Col = csv.Col

csv.Importer = PredictPostings(suggest_accounts=False)(csv.Importer)

CONFIG = [
     csv.Importer({Col.DATE: 'Date',
                  Col.PAYEE: 'Transaction Details',
                  Col.AMOUNT_DEBIT: 'Funds Out',
                  Col.AMOUNT_CREDIT: 'Funds In'},
                 'Assets:Simplii:Chequing-9875',
                 'CAD',
                 ['Filename: .*SIMPLII_.*\.csv',
                  'Contents:\n.*Date, Transaction Details, Funds Out, Funds In']
                 ),
    ]

Could somebody please point me in the right direction here on what I am doing wrong. I am very interested in this and have some experience with ML and hope to add to this project where I can once I get everything up and running.

tarioch commented 6 years ago

Hi, this is tied to #33 as soon as the PR for this is merged on beancount it should work exactly like you wrote.

johannesjh commented 6 years ago

the following code is equivalent, but has these advantages:

So in your import config file, you can write...

import sys
from os import path

from beancount.ingest.importers import csv
from beancount.ingest.importers.csv import Col

from smart_importer.predict_postings import PredictPostings

sys.path.insert(0, path.join(path.dirname(__file__)))

class SimpliiImporter(csv.Importer):
    '''
    Importer for the Simplii bank.
    Note: This undecorated class can be regression-tested with
    beancount.ingest.regression.compare_sample_files
    '''

    def __init__(self):
        super().__init__(
            {Col.DATE: 'Date',
             Col.PAYEE: 'Transaction Details',
             Col.AMOUNT_DEBIT: 'Funds Out',
             Col.AMOUNT_CREDIT: 'Funds In'},
            'Assets:Simplii:Chequing-9875',
            'CAD',
            [
                'Filename: .*SIMPLII_.*\.csv',
                'Contents:\n.*Date, Transaction Details, Funds Out, Funds In'
            ]
        )

CONFIG = [
    PredictPostings(suggest_accounts=False, training_data='myfile.beancount')(SimpliiImporter)(),
]

EDIT: On a further note, as long as pull request #33 is not merged into beancount, the training data must be specified as an argument to the decorator.

mondjef commented 6 years ago

perfect! Thanks @tarioch and @johannesjh

mondjef commented 6 years ago

ok I got this to work somewhat....

I used @johannesjh reply as a template just changing my path to the training_data. Fava loads my config and identifies the file properly, however it displays the importer as ".SimpliiImporter: "Assets:Simplii:Chequing-9875" and when I click extract Fava encounters errors. I suspect that the extract errors are probably related to the way the importer is being identified. I poked around a bit without luck, any suggestions?

johannesjh commented 6 years ago

I just pulled the latest versions of beancount, fava and smart_importer, but I could not reproduce the issue. My personal config file looks like this. (I only renamed the true bank names, but otherwise copied the code straight out of my actual bookkeeping folder)

./2018.beancount is my main beancount file

./import.config.py is my beancount.ingest import configuration:

#!/usr/bin/env python3
"""Import configuration."""

# Insert our custom importers path here.
# (In practice you might just change your PYTHONPATH environment.)
import sys
from os import path
sys.path.insert(0, path.join(path.dirname(__file__)))

from importers import bank1
from importers import bank2

from beancount.ingest import extract

# Setting this variable provides a list of importer instances.
CONFIG = [
    bank1.SmartBank1Importer(),
    bank2.SmartBank2Importer()
]

# Override the header on extracted text (if desired).
extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'

./importers is a python module ./importers/bank1 and importers/bank2 are python modules (only with different bank names). I have defined my importers in the __init__.py files within these modules. for example:

./importers/bank2/__init__.py is structured as follows:

class Bank2Importer(importer.ImporterProtocol):
    # custom implementation...

@PredictPostings(
    training_data=cache.get_file(
        os.path.abspath(os.path.join(
            os.path.dirname(__file__), '../../2018.beancount'))
    ),
    account='Liabilities:Bank2:Creditcard'
)
class SmartBank2Importer(Bank2Importer):
    '''A smart version of the importer.'''
    pass

Fava identifies the importer as follows:

File Importer Account  
downloaded.csv importers.bank2.SmartBank2Importer Liabilities:Bank2:Creditcard Extract
mondjef commented 6 years ago

I think I have narrowed it down to the runpy module that is being called by Fava. Using your example from your first post I was defining my custom importer class directly in my importer config file instead as a separate module outside of the import config file. I am in the process of trying to separate it out like you have done in your most recent example but I am getting errors at the moment that I am tracking down.

johannesjh commented 6 years ago

@tarioch : I think you said you are also applying the decorator as a function call right in your beancount config file, right? Have you not been experiencing @mondjef 's problems with how fava identifies the importer class?

tarioch commented 6 years ago

Nope, not getting this, my config looks like this:

sys.path.insert(0, path.join(path.dirname(__file__)))

FooImporter = PredictPostings(suggest_accounts=False)(mt940importer.Importer)

CONFIG = [
        FooImporter(),
    ]

extract.HEADER = ''
mondjef commented 6 years ago

ok I am definitely getting warmer...

here is what I have now...works fine with bean-identify and bean-extract for the non-smart importer version, however with the smart version only bean-identify works. bean-extract fails with the following errors.

In addition, I implemented the file_account method which works but I had trouble getting the 'file' variable and thus resorted to using the 'if file:' statement...this would be due to my lack of python abilities and variable scope I think.

bean-extract /beancount/office/example.import /beancount/Downloads/SIMPLII_9 875_2018-04-22.csv -e /beancount/personal.beancount DEBUG:smart_importer.predict_postings:The Decorator was applied to a class. ;; -- mode: org; mode: beancount; coding: utf-8; -- ** /beancount/Downloads/SIMPLII_9875_2018-04-22.csv DEBUG:smart_importer.predict_postings:About to call the importer's extract function to receive entries to be imported... DEBUG:smart_importer.predict_postings:Trying to read the importer's file_account, to be used as default value for the decorator's account argument... DEBUG:smart_importer.predict_postings:Read file_account Assets:Simplii:Chequing-9875 from the importer; using it as known account in the decorator. DEBUG:smart_importer.machinelearning_helpers:Reading training data from _FileMemo "/beancount/personal.beancount"... ERROR:root:Importer importers.simplii.SmartSimpliiImporter: "Assets:Simplii".extract() raised an unexpected error: ERROR:root:Traceback: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/beancount/ingest/extract.py", line 176, in extract allow_none_for_tags_and_links=allow_none_for_tags_and_links) File "/usr/local/lib/python3.6/site-packages/beancount/ingest/extract.py", line 70, in extract_from_file new_entries = importer.extract(file, kwargs) File "/usr/local/lib/python3.6/site-packages/smart_importer/predict_postings.py", line 102, in wrapper return decorator.enhance_transactions() File "/usr/local/lib/python3.6/site-packages/smart_importer/predict_postings.py", line 110, in enhance_transactions existing_entries=self.existing_entries) File "/usr/local/lib/python3.6/site-packages/smart_importer/machinelearning_helpers.py", line 37, in load_training_data assert not errors AssertionError

config file

#!/usr/bin/env python3
"""Example import configuration."""

# Insert our custom importers path here.
# (In practice you might just change your PYTHONPATH environment.)
import sys
from os import path

from beancount.ingest import extract

sys.path.insert(0, path.join(path.dirname(__file__)))
from importers import simplii

CONFIG = [
    simplii.SmartSimpliiImporter()
]

# Override the header on extracted text (if desired).
extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8; -*-\n'

simplii.Importer init.py file

#!/usr/bin/env python3

from beancount.ingest import extract
from beancount.ingest.importers import csv
from beancount.ingest import cache
from beancount.ingest import regression
import re
from os import path

from smart_importer.predict_postings import PredictPostings

class SimpliiImporter(csv.Importer):
    '''
    Importer for the Simplii bank.
    Note: This undecorated class can be regression-tested with
    beancount.ingest.regression.compare_sample_files
    '''

    config = {csv.Col.DATE: 'Date',
            csv.Col.PAYEE: 'Transaction Details',
            csv.Col.AMOUNT_DEBIT: 'Funds Out',
            csv.Col.AMOUNT_CREDIT: 'Funds In'}

    account_map = {'7655':'Chequing-9875'}

    def __init__(self, account_map=account_map, base_account='Assets:Simplii'):
      super().__init__(
        self.config,
        None,
        'CAD',
        ['Filename: .*SIMPLII_\d{4}_.*\.csv',
         'Contents:\n.*Date, Transaction Details, Funds Out, Funds In'],
        institution='Simplii'
        ),
      self.account_map = account_map
      self.base_account = base_account
    def file_account(self, file):
        if file:
            m = re.match(r'.+SIMPLII_(\d{4})_.*', file.name)[1]
            if m:
                sub_account = self.account_map.get(m)
                if sub_account:
                   account = self.base_account + ':' + sub_account
                   return account

        return self.base_account

@PredictPostings(training_data=cache.get_file('/beancount/personal.beancount'))
class SmartSimpliiImporter(SimpliiImporter):
    '''
    A smart version of the Simplii importer.
    '''
    pass
mondjef commented 6 years ago

I have ironed a few other issues with my modified version of the built-in beancount csv importer, now it works without issue for the non-smart version, however I cannot get the smart version to get past trying to load training data. Keep getting the assertion error previously identified, is this a bug?

mondjef commented 6 years ago

never mind...finally got it to work. There is an unused pad entry in my beancount file....error message thrown by smart importer could pass along the error message of beancount to be a bit more intuitive.