beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
246 stars 28 forks source link

Package working with python 3.6.12 but not with python 3.9 #110

Closed vperrin59 closed 2 years ago

vperrin59 commented 2 years ago

When running smart imported with python 3.9 I always get:

Cannot train the machine learning model because the training data is empty.

tarioch commented 2 years ago

Hmm, interesting, I've been running with python 3.9 for a while. Would you mind testing out the version from master? That one should have some more detailed error messages to pinpoint where the issue could be

vperrin59 commented 2 years ago

I did pip freeze to check the version I was using: smart-importer==0.1

Maybe there is a new version I can try

tarioch commented 2 years ago

would you mind quickly checking out this repo and then doing a pip install from it?

tarioch commented 2 years ago

Actually even from pypi you should get the newer version (0.3) https://pypi.org/project/smart-importer/ maybe just try this and see if this fixes it.

tarioch commented 2 years ago

Something like

pip install --upgrade smart-importer
vperrin59 commented 2 years ago

Yes I just did, not yet used with all install / update commands.

I checked and I have 0.3 version now but I still have

Cannot train the machine learning model because the training data is empty.

with Python 3.9.4

tarioch commented 2 years ago

Can you try something like

git clone https://github.com/beancount/smart_importer.git
pip install -e smart_importer

That will install the latest development version, that should have the better error messages to specify in detail what is missing. I'm using Python 3.9.7 but I don't think this should make the difference.

vperrin59 commented 2 years ago

I used your commands, I didn't get errors when installing

pip freeze | grep smart -e git+https://github.com/beancount/smart_importer.git@3119575958ecad37d02f05d174c327f95a3fd4a8#egg=smart_importer

Then when extracting:

Cannot train the machine learning modelNone of the training data matches the accounts Cannot train the machine learning model because there is only one target.

vperrin59 commented 2 years ago

Ok I think I found out the root cause. I think my issue is related to smart_importer version.

In my 3.6 setup, smart_import was using 0.1 and in 3.9 setup it was using > 0.1.

There seems to make some difference. If I revert my 3.9 env with smart_importer == 0.1 it's working

tarioch commented 2 years ago

How are you setting up your importers? Could be that there have been some changes for the setup between versions.

vperrin59 commented 2 years ago

Cembra importer from one of your repo :)

`

from tariochbctools.importers.cembrastatement import importer as cembrastatementimp from smart_importer import apply_hooks, PredictPostings

CONFIG = [ apply_hooks(cembrastatementimp.Importer("\d+.pdf", "Liabilities:CreditCard:Cumulus"), [PredictPostings()]) ]

`

Then I call

bean-extract ./Importers/cembra.py test.pdf -f ./Cumulus/cumulus.beancount

I did some print debug, and I think the smart_importer expect to see account open posting in ./Cumulus/cumulus.beancount. I did not put them in there.

tarioch commented 2 years ago

Ah that makes sense, so it can't train the model because there isn't any data. Looks correct then, once you have some entries it will be able to train the model.

vperrin59 commented 2 years ago

When I added the open posting now I have

Cannot train the machine learning model because there is only one target.

and that I don't understand. For me it's ok that I have only the account Liabilities:CreditCard:Cumulus

tarioch commented 2 years ago

Just to confirm, you currently have something like this in the file

2020-01-01 * "Buy Milk"
  Liabilities:CreditCard:Cumulus                     -4 CHF
  Expenses:Milk
vperrin59 commented 2 years ago

./Cumulus/cumulus.beancount:

2020-07-01 open Liabilities:CreditCard:Cumulus

include "Beancount/cumulus_2020_07.beancount"

Beancount/cumulus_2020_07.beancount:

2020-07-15 * "Bar"
  Liabilities:CreditCard:Cumulus  -9.70 CHF
  Expenses:CH:Drinks
tarioch commented 2 years ago

Ok, I'm guessing that right now it only starts to predict if there are multiple targets involved. So basically if you can add one more entry with a different expense type, then it should start predicting. With only this training data, the result would always be "Expenses:CH:Drinks".

vperrin59 commented 2 years ago

Well I've just written one posting to show one example, but I have plenty of those in the file and for different months. The predictions were working fine with version 0.1. I think something else changed.

tarioch commented 2 years ago

Can you add some printing to https://github.com/beancount/smart_importer/blob/3119575958ecad37d02f05d174c327f95a3fd4a8/smart_importer/predictor.py#L86

Especially the training data looks suspicious

tarioch commented 2 years ago

I'm adding a minimum test case for what is required in regards to training data. See #112
The key is that (currently) smart_importer requires at least 2 different possible accounts in the data for the prediction, otherwise it would give you exactly the error you're getting. Are you sure that you have entries like this

2020-07-15 * "Bar"
  Liabilities:CreditCard:Cumulus  -9.70 CHF
  Expenses:CH:Drinks

2020-07-15 * "Foo"
  Liabilities:CreditCard:Cumulus  -20.00 CHF
  Expenses:CH:SomethingElse

So basically for Liabilities:CreditCard:Cumulus two different "counter accounts"

tarioch commented 2 years ago

Created some changes that would also allow "predicting" if all your training data always has the same counter account. Can you checkout bugfix/predict_single_account and try if this fixes your issue?

vperrin59 commented 2 years ago

I tried your new version and the warning doesn't pop up but now it predicts with all the same counter account. So it seems when doing the analysis it only detects one counter account even though there are many

johannesjh commented 2 years ago

can you confirm @tarioch 's suspicion that you are using training data with only one counter account? (because in this case, you are observing correct predictions... if the machine learning model is trained with just one counter account, it will always predict this one counter account).

EDIT: can you post a minimal example of the training data you are using?

vperrin59 commented 2 years ago

I have many counter accounts, and they seem well detected with version 0.1 if smart imported. I will try debugging how the counter accounts are detected with the new version.

Here is an example of one my bean count used for training

2020-08-10 * "Restaurant"
  Liabilities:CreditCard:Cumulus  -43 CHF
  Expenses:CH:Restaurant

2020-08-11 * "Coop"
  Liabilities:CreditCard:Cumulus  -47.8 CHF
  Expenses:CH:Food:Coop

2020-08-14 * "Migros"
  Liabilities:CreditCard:Cumulus  -132.7 CHF
  Expenses:CH:Food:Migros

2020-08-15 * "Outlet"
  Liabilities:CreditCard:Cumulus  -42 CHF
  Expenses:CH:Clothes

2020-08-16 * "Shell Montbenon Lausanne CHE"
  Liabilities:CreditCard:Cumulus  -25.29 CHF
  Expenses:CH:Car:Petrol
vperrin59 commented 2 years ago

Ok I root caused my issue. It was again related to open account directive.

Originally I had all my open account directives in a separate beancount file that was not provided to smart_importer and that worked in version 0.1

After switching to version 0.3 I only provided CreditCard account directives to smart_importer but that was enough. I have to provide all the target account open directives as well

johannesjh commented 2 years ago

ok, so it is working fine now?

vperrin59 commented 2 years ago

Yes the predictions are working, do you think it's worth adding warning message when finding target account posting without the open directive