beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
246 stars 28 forks source link

Using `file_account()` to filter training data doesn't work for certain use cases #122

Open redstreet opened 1 year ago

redstreet commented 1 year ago

smart_importerfilters training data from a single account, obtained by calling the importer'sfile_account()`, as discussed in #30. This presents two questions:

1) (For my understanding): shouldn't it filter out for accounts in each transaction instead of assuming the importer will always have one single account (that file_account() returns) as a part of every transaction?

2) My question: There are cases where file_account() is not the account we want to train by. The use case at least a couple of users have run into (including myself) is here. In short, when importing using a commodity-leaf structure, postings look like this:

2020-02-20 * "Transfer"
  Assets:Investments:Fidelity:USD -20 USD
  <to_be_predicted>

file_account() must return Assets:Investments:Fidelity (without the :USD), so bean-file has the correct directory to file this against. However, this means that smart_importer fails to find training data, since that account doesn't exist.

Is there a config option or alternative I'm missing here? If this is a valid use case, I'm happy to help figure out a solution + patch.

johannesjh commented 1 year ago

Hi, the use case sounds somewhat exotic but legitimate... thank you for bringing this up.

As for a potential solution, what you want to achieve is to not filter training data by account, correct? Training data is filtered in the EntryPredictor.training_data_filter method. You can overwrite this method in inheriting classes. For example:

class PredictPostingsCustom(PredictPostings):
    def training_data_filter(self, txn):
        return True

You can, of course, implement any other filtering logic instead of simply returning True. Is this what you are looking for?

redstreet commented 1 year ago

Awesome, that helps! I do have a suggestion though: would smart_importer consider making self.account a list, and also expose it via the constructor?

This way, a) we can have more than one interesting account, and b) can pass in the list without needing to write code.

Here's my sense of why this might becoming a fairly often used case in the future: anyone using commodity-leaf accounts (eg: Assets:Fidelity:FSSIX) would run into this sooner or later, because the filing account (Assets:Fidelity) is not the same as the account smart_importer should filter by. However, I imagine nobody had run into it until now because smart_importer is (understandably) not used with investment accounts.

Cash management accounts, which are becoming popular (anecdotal), change all that: they mix investments with expenses. So a few of us ran into it, including myself. Here's the ugly hack we've been using, just FYI.

johannesjh commented 1 year ago

Glad it helped!

would smart_importer consider making self.account a list

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

redstreet commented 1 year ago

Not sure. This needs careful consideration. If I remember well, the upstream beancount importer classes have a singular account field where importers file the imported CSV or PDF. This may or may not be in conflict with what you are proposing.

It wouldn't be in conflict because what I'm proposing is to broaden the set of transactions that smart_importer uses as training data. This makes no changes to the singular filing account that beancount upstream classes might expect. Does that make sense or am I missing something here? Thanks!