beancount / smart_importer

Augment Beancount importers with machine learning functionality.
MIT License
248 stars 29 forks source link

[Feature] Add Chinese support via Jieba tokenizer #114

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi Contributors,

I am Chinese, very love beancount, also want to use smart_import to enhance experiences when I import Bank statements. I try this tool, but you know, Chinese did not have break or space within words. So the SVM cannot analyze Chinese now. Here is a Chinese sentence.

eg. "我和小明一起吃晚饭。"

We are rely on tokenizer tool to split words. So I using the most popular tokenizer tools jieba to support this function.

I am sure, my code can let the smart_import more smart. I wrote some code and test, but in a rude way.

If you have any suggestion or feedback, very welcome write down here.

Thanks.

EINDEX commented 2 years ago

Create in the wrong account, please delete this, and see #115, Thanks.