Closed vperrin59 closed 2 years ago
Hmm, interesting, I've been running with python 3.9 for a while. Would you mind testing out the version from master? That one should have some more detailed error messages to pinpoint where the issue could be
I did pip freeze to check the version I was using: smart-importer==0.1
Maybe there is a new version I can try
would you mind quickly checking out this repo and then doing a pip install from it?
Actually even from pypi you should get the newer version (0.3) https://pypi.org/project/smart-importer/ maybe just try this and see if this fixes it.
Something like
pip install --upgrade smart-importer
Yes I just did, not yet used with all install / update commands.
I checked and I have 0.3 version now but I still have
Cannot train the machine learning model because the training data is empty.
with Python 3.9.4
Can you try something like
git clone https://github.com/beancount/smart_importer.git
pip install -e smart_importer
That will install the latest development version, that should have the better error messages to specify in detail what is missing. I'm using Python 3.9.7 but I don't think this should make the difference.
I used your commands, I didn't get errors when installing
pip freeze | grep smart -e git+https://github.com/beancount/smart_importer.git@3119575958ecad37d02f05d174c327f95a3fd4a8#egg=smart_importer
Then when extracting:
Cannot train the machine learning modelNone of the training data matches the accounts Cannot train the machine learning model because there is only one target.
Ok I think I found out the root cause. I think my issue is related to smart_importer version.
In my 3.6 setup, smart_import was using 0.1 and in 3.9 setup it was using > 0.1.
There seems to make some difference. If I revert my 3.9 env with smart_importer == 0.1 it's working
How are you setting up your importers? Could be that there have been some changes for the setup between versions.
Cembra importer from one of your repo :)
`
from tariochbctools.importers.cembrastatement import importer as cembrastatementimp from smart_importer import apply_hooks, PredictPostings
CONFIG = [ apply_hooks(cembrastatementimp.Importer("\d+.pdf", "Liabilities:CreditCard:Cumulus"), [PredictPostings()]) ]
`
Then I call
bean-extract ./Importers/cembra.py test.pdf -f ./Cumulus/cumulus.beancount
I did some print debug, and I think the smart_importer expect to see account open posting in ./Cumulus/cumulus.beancount. I did not put them in there.
Ah that makes sense, so it can't train the model because there isn't any data. Looks correct then, once you have some entries it will be able to train the model.
When I added the open posting now I have
Cannot train the machine learning model because there is only one target.
and that I don't understand. For me it's ok that I have only the account Liabilities:CreditCard:Cumulus
Just to confirm, you currently have something like this in the file
2020-01-01 * "Buy Milk"
Liabilities:CreditCard:Cumulus -4 CHF
Expenses:Milk
./Cumulus/cumulus.beancount:
2020-07-01 open Liabilities:CreditCard:Cumulus
include "Beancount/cumulus_2020_07.beancount"
Beancount/cumulus_2020_07.beancount:
2020-07-15 * "Bar"
Liabilities:CreditCard:Cumulus -9.70 CHF
Expenses:CH:Drinks
Ok, I'm guessing that right now it only starts to predict if there are multiple targets involved. So basically if you can add one more entry with a different expense type, then it should start predicting. With only this training data, the result would always be "Expenses:CH:Drinks".
Well I've just written one posting to show one example, but I have plenty of those in the file and for different months. The predictions were working fine with version 0.1. I think something else changed.
Can you add some printing to https://github.com/beancount/smart_importer/blob/3119575958ecad37d02f05d174c327f95a3fd4a8/smart_importer/predictor.py#L86
Especially the training data looks suspicious
I'm adding a minimum test case for what is required in regards to training data. See #112
The key is that (currently) smart_importer requires at least 2 different possible accounts in the data for the prediction, otherwise it would give you exactly the error you're getting. Are you sure that you have entries like this
2020-07-15 * "Bar"
Liabilities:CreditCard:Cumulus -9.70 CHF
Expenses:CH:Drinks
2020-07-15 * "Foo"
Liabilities:CreditCard:Cumulus -20.00 CHF
Expenses:CH:SomethingElse
So basically for Liabilities:CreditCard:Cumulus two different "counter accounts"
Created some changes that would also allow "predicting" if all your training data always has the same counter account. Can you checkout bugfix/predict_single_account and try if this fixes your issue?
I tried your new version and the warning doesn't pop up but now it predicts with all the same counter account. So it seems when doing the analysis it only detects one counter account even though there are many
can you confirm @tarioch 's suspicion that you are using training data with only one counter account? (because in this case, you are observing correct predictions... if the machine learning model is trained with just one counter account, it will always predict this one counter account).
EDIT: can you post a minimal example of the training data you are using?
I have many counter accounts, and they seem well detected with version 0.1 if smart imported. I will try debugging how the counter accounts are detected with the new version.
Here is an example of one my bean count used for training
2020-08-10 * "Restaurant"
Liabilities:CreditCard:Cumulus -43 CHF
Expenses:CH:Restaurant
2020-08-11 * "Coop"
Liabilities:CreditCard:Cumulus -47.8 CHF
Expenses:CH:Food:Coop
2020-08-14 * "Migros"
Liabilities:CreditCard:Cumulus -132.7 CHF
Expenses:CH:Food:Migros
2020-08-15 * "Outlet"
Liabilities:CreditCard:Cumulus -42 CHF
Expenses:CH:Clothes
2020-08-16 * "Shell Montbenon Lausanne CHE"
Liabilities:CreditCard:Cumulus -25.29 CHF
Expenses:CH:Car:Petrol
Ok I root caused my issue. It was again related to open account directive.
Originally I had all my open account directives in a separate beancount file that was not provided to smart_importer and that worked in version 0.1
After switching to version 0.3 I only provided CreditCard account directives to smart_importer but that was enough. I have to provide all the target account open directives as well
ok, so it is working fine now?
Yes the predictions are working, do you think it's worth adding warning message when finding target account posting without the open directive
When running smart imported with python 3.9 I always get:
Cannot train the machine learning model because the training data is empty.