Closed tarioch closed 6 years ago
I agree. For me, that's the central question for this whole undertaking, namely: how to integrate with beancount and fava. What I mean by that is, your considerations, how I understand them, go very much in the same direction as #7 and beancount/fava#579, i.e., now that we have the functionality for enhancing imports through machine learning, how should this functionality be integrated with beancount and fava, and how should it be accessible to end users? Some goals for this integration:
In summary, I think we need to go talk with beancount and fava about this. It seems to me that beancount/fava#579 would be a suitable place for such discussion, or we could open a ticket in beancount/beancount.
I'll still have to dig a bit deeper, but I think there are at least some classes around in beancount
for dealing with CSV and OFX. The way it looks is you can instantiate them in a my.import config the way you need it. The way beancount does my.beancount it's currently python as well, so shouldn't be too hard to allow to wrap the importer with the predictPostings/predictPayees. So given the following situation: I know how to program in python, I would like to instantiate beancount.ingest.importer.Csv and have it wrapped by the predictPostings. How would I do this without modifying beancount.ingest.importer.Csv?
As for general improvements to import configs. I could see a simple tool which might generate the config the way you need it. Similar to the import wizard of Gnucash, first you specify things like separator, date formats and then you pick the columns you're interested in and define which kind of data in each column is. Now add to that the option of "autocomplete opposite site of the transaction".
How would I do this without modifying beancount.ingest.importer.Csv?
well, something along those lines...
import os
from beancount.ingest import importer, cache
from beancount.ingest.cache import _FileMemo
from smart_importer.predict_postings import PredictPostings
class MyImporter(importer.ImporterProtocol):
'''
This is my importer, possibly inheriting from or using beancount.ingest.importer.Csv,
or otherwise written by hand.
Note: This importer purposely includes no machine learning,
in order to make it easier to do regression testing using `beancount.ingest.regression`.
'''
# whatever implementation you see fit, this is a regular beancount importer
pass
class MySmartImporter(MyImporter):
'''
A smart version of the importer.
'''
@PredictPostings(
training_data=cache.get_file(
os.path.abspath(os.path.join(
os.path.dirname(__file__), 'myfile.beancount'))
),
filter_training_data_by_account='Assets:My:Account'
)
def extract(self, file: _FileMemo):
return super().extract(file)
As for general improvements to import configs. I could see a simple tool...
yes, exactly!
What I was thinking is to not having to subclass the importer at all. I'm still pretty fresh with python so maybe I'm missing the obvious thing, right now the way I see to be able to do this would be to create a generic class "SmartImporter" which takes an instance of another importer and would look something like this:
class SmartImporter(importer.ImporterProtocol):
def __init(self, real_importer: importer.ImporterProtocol)
...
def extract(self, *args, **kwargs):
self.real_importer.extract(args, kwargs)
...
and then the usage in the import config would simply be
CONFIG = [
SmartImporter(importers.csv.Importer(...))
]
What feels a bit off is that @PredictPostings/@PredictPayees is already something which wraps around a real implementation so having to do this again manually feels strange. I'm wondering if there isn't a better way to instantiate @PredictPostings/@PredictPayees from the outside.
I am not a Python guru either, but what you are suggesting ("a generic class "SmartImporter" which takes an instance of another importer") sounds a bit like class decorators, compare e.g., https://www.codementor.io/sheena/advanced-use-python-decorators-class-function-du107nxsv#decorating-classes
Since decorators in pythons are simple Callable
s, it should be possible to call them directly instead of annotating them to a class. In other words, you are free to either decorate the class, as in the following example:
@PredictPostings
class MyCustomCSVImporter(CsvImporter):
pass
CONFIG = [
MyCustomCSVImporter(whatever, arguments)
]
...or to directly call the same decorator with an importer instance, inline in your config file, if you wish, as follows:
CONFIG = [
PredictPostings(CsvImporter(whatever, arguments))
]
Decorating the importer classes instead of the extract methods would also be promising for another reason: You brought up in issue #10 the fact that beancount's importers can now officially be instantiated with pre-existing entries. This information would of course be highly interesting for the machine learning functionality! - By decorating the importer class instead of just the extract
method, the decorator and thus the machine learning could maybe access this information? (I have not checked the beancount code yet to see if that would be possible).
oh right.
but still: if we would decorate importer classes instead of extract functions, this would allow you to write...
CONFIG = [
PredictPostings(CsvImporter(whatever, arguments))
]
...because that's what you originally wanted in this ticket, isn't it?
Yes exactly.
I spent a few hours (took me much longer than I anticipated) to refactor predict_postings
into a decorator for classes. New usage, with the decorator applied to the class:
@PredictPostings(training_data=self.training_data)
class DummyImporter(ImporterProtocol):
def extract(self, file, existing_entries=None):
return the.imported.transactions
The refactoring of the second decorator for predicting payees is still missing. But would you mind having a look at the new code in branch "decorator-for-importer-classes", I would welcome your feedback, thank you! Johannes
I'm probably doing something wrong but I'm getting the following exception with my importer
from dateutil.parser import parse
from beancount.ingest import importer
from beancount.core import data
from beancount.core import amount
from beancount.core.number import D
from beancount.ingest.importers import regexp
from smart_importer.predict_postings import PredictPostings
@PredictPostings()
class Importer(regexp.RegexpImporterMixin, importer.ImporterProtocol):
"""An importer for account statements."""
def __init__(self, regexps):
if isinstance(regexps, str):
regexps = [regexps]
regexp.RegexpImporterMixin.__init__(self, regexps)
def file_account(self, file):
return 'Assets:Foo'
def extract(self, file, existing_entries):
entries = []
meta = data.new_metadata(file.name, 0)
txn = data.Transaction(
meta,
parse('2017-11-20').date(),
'*',
None,
'Two Postings',
data.EMPTY_SET,
data.EMPTY_SET,
[
data.Posting('Assets:Patrick:CHF', amount.Amount(D('12'), 'CHF'), None, None, None, None),
data.Posting('Assets:Patrick:USD', amount.Amount(D('12'), 'CHF'), None, None, None, None),
]
)
entries.append(txn)
txn = data.Transaction(
meta,
parse('2017-11-20').date(),
'*',
None,
'Single Posting',
data.EMPTY_SET,
data.EMPTY_SET,
[
data.Posting('Assets:Patrick:CHF', amount.Amount(D('12'), 'CHF'), None, None, None, None),
]
)
entries.append(txn)
return entries
Traceback (most recent call last):
File "/home/patrickr/.local/share/virtualenvs/ledger-yg7yJ6bm/bin/bean-extract", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/home/patrickr/projects/beancount/bin/bean-extract", line 4, in <module>
from beancount.ingest.extract import main; main()
File "/home/patrickr/projects/beancount/beancount/ingest/extract.py", line 192, in main
args, config, downloads_directories = scripts_utils.parse_arguments(parser)
File "/home/patrickr/projects/beancount/beancount/ingest/scripts_utils.py", line 56, in parse_arguments
mod = runpy.run_path(args.config)
File "/usr/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "main.import", line 8, in <module>
from importers.bitst import importer as bitstimp
File "/home/patrickr/ledger/importers/bitst/importer.py", line 18, in <module>
class Importer(importer.ImporterProtocol):
File "/home/patrickr/ledger/importers/bitst/importer.py", line 27, in Importer
@PredictPostings()
File "/home/patrickr/projects/smart_importer/smart_importer/predict_postings.py", line 60, in __call__
class PredictPostingsImporter(OriginalImporter):
TypeError: function() argument 1 must be code, not str
tried to but could not reproduce, see added file smart_importer/tests/predict_postings_example_test.py
.
btw, I suggest we rename this ticket and name it "apply decorator to importer class", since that came out to be the intended solution?
agreed to the rename
As for the issue, I'm wondering if there might be something with the interaction with bean-extract. I'm running the importer like this
bean-extract main.import downloads -f main.beancount
(using master version of beancount as the code for the existing_entries logic is not yet in the released build)
tried to reproduce by fleshing out the example. I am getting errors, but different than yours. I pushed broken code (only in the branch) so you can make the example more realistic, more similar to how you intend to use it. I have to interrupt now, can continue on this later. thx Johannes
I'll do that as soon as I have some time, Thanks for preparing it that far
ok thanks
btw, I am considering to use a python decorator library instead of the DIY solution to simplify the code. apparently, how a decorator is called depends on how it is applied... PredictPostings
is different from PredictPostings()
, and these differences need to be handled in the decorator implemetation (or by using a library).
My mistake, I had another importer which was picked up where I forgot to adjust the the annotation, will look at it some more.
Did a small change to the example and now it's producing the same error I have locally.
Basically I made existing_entries non optional and I think the error it produced before was caused by the same thing, namely that existing_entries was None instead of the expected list.
I made existing_entries non optional
that breaks with the API defined in the ImporterProtocol because existing_entries should be a keyword argument, not a positional argument.
I think I've narrowed it down some more. extract.py in beancount checks the signature and only passes the existing_entries if they exist in the signature, as the interceptor now doesn't have this in the signature it does not get passed and then things fail afterwards.
Maybe using a library would allow to keep the signature intact?
not sure how a library could help in this case. - why not use a simple assertion that existing_entries have been passed? (and document the restriction that existing_entries have to passed as argument in the documentation)
def extract(self, file, existing_entries=None):
'''
:param existing_entries: must not be null since this importer relies on existing entries
'''
assert existing_entries
# ...
# return the_imported_entries
What I meant is the following: beancount checks the signature and only passes existing_entries if it's in the signature
from beancount/ingest/extract.py
kwargs = {}
if 'existing_entries' in inspect.signature(importer.extract).parameters:
kwargs['existing_entries'] = existing_entries
new_entries = importer.extract(file, **kwargs)
As the wrapper class does not have existing_entries in the signature, this will not work properly. So I think we'll have to add the existing_entries to our signature and maybe do something similar to the beancount code to pass or not pass it depending on the actual signature.
I have it now working with the fixed (full) signature which includes existing_entries. I think to be backward compatible we probably need to do the same as beancount and check the signature and depending on that add or not add existing_entries
Implemented :)
Hello
I'm busy writing an importer for my csv bank statements and smart importer looks like a very good and elegant way to predict postings and payees. My importer runs fine but as soon as I add the decorator unfortunately I get the same error than @tarioch above.
TypeError: function() argument 1 must be code, not str
Then I tried running the example from the branch "decorator-for-importer-classes" (launching ./run-bean-extract.sh) but then I get another error:
;; -*- mode: beancount -*- **** /Users/julien/Documents/4_Code/smart_importer/smart_importer/examples/downloads/downloaded.csv ERROR:root:Importer smart_importer.predict_postings.PredictPostingsImporter.extract() raised an unexpected error: 'NoneType' object is not iterable ERROR:root:Traceback: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/beancount/ingest/extract.py", line 168, in extract allow_none_for_tags_and_links=allow_none_for_tags_and_links) File "/usr/local/lib/python3.6/site-packages/beancount/ingest/extract.py", line 64, in extract_from_file new_entries = importer.extract(file) File "/Users/julien/Documents/4_Code/smart_importer/smart_importer/predict_postings.py", line 73, in extract existing_entries File "/Users/julien/Documents/4_Code/smart_importer/smart_importer/predict_postings.py", line 86, in _extract self.converted_training_data = [TxnPosting(t, p) for t in self.training_data for p in t.postings TypeError: 'NoneType' object is not iterable
I'm new to beancount and python so that probably doesn't help ;-)
Thanks in advance for any help.
If you use the master branch, then it should work fine with the decorator on the extract function.
Thanks for your help :-) I was able to go a bit further after adding opening statements for each assets/expenses in my training_data. Otherwise the file would not get validated by Beancount loader. Now stuck as it seems to be unable to recognize the transactions from my importer extract function. Anyway I will keep looking :-)
DEBUG:smart_importer.machinelearning_helpers:Reading training data from file "/Users/julien/Documents/3_Finance/Beancount/trainingdata.beancount"... DEBUG:smart_importer.machinelearning_helpers:Finished reading training data. DEBUG:smart_importer.predict_postings:About to train the machine learning model... INFO:smart_importer.predict_postings:Finished training the machine learning model. DEBUG:smart_importer.predict_postings:About to call the importer's extract function in order to receive entries to be imported... Traceback (most recent call last): File "/Users/julien/Documents/4_Code/blais-beancount-9c9de54513ab/bin/bean-extract", line 4, in <module> from beancount.ingest.extract import main; main() File "/Users/julien/.pyenv/versions/3.6.4/lib/python3.6/site-packages/beancount/ingest/extract.py", line 195, in main args, config, downloads_directories = scripts_utils.parse_arguments(parser) File "/Users/julien/.pyenv/versions/3.6.4/lib/python3.6/site-packages/beancount/ingest/scripts_utils.py", line 56, in parse_arguments mod = runpy.run_path(args.config) File "/Users/julien/.pyenv/versions/3.6.4/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/Users/julien/.pyenv/versions/3.6.4/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/Users/julien/.pyenv/versions/3.6.4/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "config.py", line 11, in <module> ing.IngImporter('Assets:Ing:Checking'), File "/Users/julien/Documents/4_Code/smart_importer/smart_importer/predict_postings.py", line 142, in _extract logger.debug(f"Received {len(transactions)} entries by calling the importer's extract function.") TypeError: object of type 'NoneType' has no len()
The branch decorator-for-importer-classes
had merge conflicts that were difficult to resolve (I could not get the testcases to run successfully). So I branched a fresh, new branch feature/apply-decorator-to-class
from master and re-created the necessary changes so that decorators can (and must) now be applied to importer classes, as opposed to extract methods.
I will merge soon to prevent the new branch from ageing again.
Cleanup: I will delete the old branch decorator-for-importer-classes
since it is no longer needed now.
@tarioch: I hope that is ok for you, since you also committed into the branch? I still have a local copy of the branch in case you need it.
If I understood it right, beancount comes with a csv parser which can be configured and you don't need to actually subclass it. So I think this would be simply a configuration in your foo.import file. It would be nice if we could then configure it somehow to add smart importer functionality without having to extend the existing importer to add the interceptor.