beancount / beangulp

Importers framework for Beancount
GNU General Public License v2.0
59 stars 23 forks source link

Option to manually set encoding for file cache #19

Closed blais closed 3 years ago

blais commented 4 years ago

Original report by Chenxing Luo (Bitbucket: chazeon, GitHub: chazeon).


In an example UTF-8 CSV which contains Chinese character and emoji (a typical Venmo statement), chardet is not correctly detect charset (Recognized as Windows-1252, Turkish). And it is difficult to set charset manually. It would be nice to allow manual setting of charset, which is normally known by the user.

Example as attached.

blais commented 4 years ago

@chazeon I was able to reproduce that.

blais commented 4 years ago

There is now an "encoding" option to the CSV importer. https://github.com/beancount/beancount/blob/master/beancount/ingest/importers/csv.py#L119