beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
248 stars 29 forks source link

Predict postings with amounts #107

Closed sullivan-sean closed 2 years ago

sullivan-sean commented 3 years ago

I'm more curious if this is possible than anything else. But I am toying around with the idea of trying to use ML to predict amounts on postings in addition to postings. I appreciate all the amazing work you all have done on smart_importer so far, and was looking through the discussions and saw past threads about predicting multiple accounts. I have a common use case which is I split a transaction with somebody, usually 50/50 and would like the output postings to be:

Assets:Cash      -100
Liabilities:Friend  50
Expenses:Food   50

For me, a first pass solution is to naively split the units evenly among the other transactions:

class PredictPostings(EntryPredictor):
    """Predicts posting accounts."""

    weights = {"narration": 0.8, "payee": 0.5, "date.day": 0.1}

    @property
    def targets(self):
        return [
            " ".join(posting.account for posting in txn.postings)
            for txn in self.training_data
        ]

    def apply_prediction(self, transaction, prediction):
        if len(transaction.postings) != 1:
            return transaction

        original_posting = transaction.postings[0] 
        accounts = [account for account in prediction.split(" ") if account != original_posting.account]
        new_units = -(original_posting.units / len(accounts))
        new_postings = [original_posting] + [Posting(account, new_units, None, None, None, None) for account in accounts]

        return transaction._replace(postings=new_postings)

as right now it seems like smart_importer could predict the names of the postings but not the amounts.

I'm wondering the best way to also try predicting the amounts. A possible ML based solution could be to predict a ratio in addition to an account name for each output posting: the input training data could be something like "Assets:Cash_1.0 Liabilities:Friend_0.5 Expenses:Food_0.5". I've also been looking into scikit's MultiOutputRegressor because it feels wrong to just attach the amount ratio to the account string.

I'm not sure if anyone else has thought about/tried to tackle this, but if so I would love to hear your thoughts. This might be too complicated to add to the library/not worth pursuing, but I wanted to open a thread to discuss in case others share this use case and want to brainstorm.

johannesjh commented 3 years ago

Regarding the shared-expenses usecase: I doubt if predicted amounts make sense for this usecase. I'll share some considerations regarding your idea and ask some follow-up questions.

follow-up questions: do you have a usecase where you would really benefit from predicted amounts? how would you deal with the uncertainty that the prediction could be wrong?

johannesjh commented 2 years ago

long time no hear... shall we close this issue?

johannesjh commented 2 years ago

(I closed this issue due to inactivity... feel free to re-open, contributions welcome!)