beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
248 stars 29 forks source link

TAGS #125

Open jpduyx opened 1 year ago

jpduyx commented 1 year ago

I am happy with the smart-importer ... it helps a lot.

I'm trying to figure out why smart-importer doesn't learn and apply the tags from the previous transactions and apply them. Is there a setting I have to change or where should I best start to figure this out?

johannesjh commented 1 year ago

Hi, thank you for your feedback, glad to hear you find it useful. smart-importer currently only predicts payees and accounts. It does not (yet) predict tags; there is no setting for it. But smart-importer could certainly be made to predict tags as well. I like the idea. Some directions in case you would like to get started:

You could start by adding a class PredictTags to __init__.py. I think it will be the easiest for your new PredictTags class to derive from EntryPredictor, similar to the PredictPayees and PredictPostings classes. Your class can then overwrite the attribute and weights member variables to specify which attribute shall be predicted (i.e., tags) based on which other weighted attributes.

The existing EntryPredictor class can predict attributes, but I don't think (I am not sure if) it can predict tags just yet. It may be the case that tags are handled in a different way from standard attributes in beancount entries. In consequence, you will quite likely have to modify some code in order to get it to work. Some hints in this direction:

The EntryPredictor.__call__ method is where the overall control flow starts. It consists of four basic steps:

  1. load_training_data loads and filters the training data. I don't think you'll have to modify this.
  2. define_pipeline creates the scikit-learn machinelearning pipeline. Amongst other things, it calls the EntryPredictor.targets method, which reads target attribute values from the training data. In your situation, the targets method needs to read tags. The existing implementation can read attributes of beancount entries, I am not sure if it can read tags. This may require some code changes.
  3. train_pipeline does what its name says. I don't expect big changes here.
  4. process_entries writes predicted values into the list of imported entries. In your situation, the method needs to write predicted tags. The existing implementation can write attributes of beancount entries, not sure if it can write tags. This may require some code changes.

Are you interested in working on this?

jpduyx commented 1 year ago

thank you for the tips and the challenge ... I'm really curious and interested to try something with this challenge