Open hollyhudson opened 5 years ago
Thanks for this! It's supposed to strip the BOM, see https://github.com/egh/ledger-autosync/blob/master/ledgerautosync/sync.py#L186. Maybe the BOM is appearing somewhere besides the absolute beginning of the file, or perhaps my code is wrong? Can you send me a 2 line paypal.csv with the data anonymized so I can replicate? Thanks.
For a one-off or alternative method, running dos2unix
on the PayPal export suffices to remove the BOM.
A bit late to the party, but this breaks on Python 3.10 because csv.DictReader
expects f
to be opened as text while the f.read(3) == codecs.BOM_UTF8
comparison only works if the file is opened as bytes. This means that the BOM is not stripped, while csv.reader
(which is used in csv.DictReader
) seems to read the f
in bytes form (it's implemented in C, so haven't looked at the code of this), which then causes the BOM to be included in the fieldnames :confused:
I'm seeing this in both the pip-downloaded version and when I install from a clone of the github repo. Here's what the error looks like:
And here's what a friend added to my Makefile to fix it:
(Sorry, that's a bit messy, the extra stuff is in there for completion, so there's context if anyone wants to duplicate it.) The sed statement is the important part -- it searches for the BOM on line one of the file and replaces it with nothing. INPUTDIR is the path to my downloaded, unprocessed files, and we're making a cleaned up paypal-nobom.csv file that is dependent on the downloaded paypal.csv file. The grep line is to deal with another problem where ledger-autosync would choke whenever it encountered one of the numerous lines with no value in the amount field (why, paypal??), so we've added that as part of the BOM-removal process.