beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
248 stars 29 forks source link

Predictions might lead to original posting being switched to 2nd place after prediction #128

Open awtimmering opened 7 months ago

awtimmering commented 7 months ago

I've had a few instances where predicted postings lead to an unnatural posting order (and I have at least 2 people confirming having seen the same). For example, the output of:

2017-01-07 * "Groceries"
  Assets:US:BofA:Checking  -10.20 USD

in certain circumstances can lead to a predicted posting of

2017-01-07 * "Groceries"
  Expenses:Groceries
  Assets:US:BofA:Checking  -10.20 USD

with the filler in the first slot, and the original posting account being pushed to 2nd slot.

Having stepped through the code as this happens, it does seem to be just a prediction fluke that can occur, where accounts can end up in this order as result of predictor.py#L234.

This seems it can be easily addressed with a somewhat simplified update_postings(transaction, accounts) (entries.py#L6) to honor the original order, i.e. always start with posting[0] and augment with any predicted accounts, but am not 100% confident I understand the intent of the original logic in update_postings() and whether this change would upset existing work flows. PR attached with unit tests.

yagebu commented 1 month ago

I believe the current behavior is somewhat intentional. I for example follow the convention to have the "main" account of a transaction last (and for example in Fava, file insertion will look at the accounts of the transaction in reverse order to determine where to insert a transaction).

One thing that might have an impact here as well is https://github.com/beancount/smart_importer/pull/123, which kind of dropped the posting order and always takes the sorted accounts as training data - I think an improvement there would be to still do that but somehow remember which of the possible orders was most common...

Edit: I've added some clearer documentation and tests for this function in #134