beancount / beangulp

Importers framework for Beancount
GNU General Public License v2.0
61 stars 24 forks source link

Make duplicate detection configureable #2

Closed blais closed 1 year ago

blais commented 7 years ago

Original report by Jakob Schnitzer (Bitbucket: yagebu, GitHub: yagebu).


I'm using Beancount's import mechanism to import the transaction of my main accounts (that are provided as CSV files by my bank). This works quite well in general and is really a step up from doing it all by hand.

Most times duplicates aren't a problem but sometimes I already typed out some transactions by hand that I'm about to import. Since my importer doesn't automatically assign account (which I do by hand on Fava's import page), these don't get recognized as duplicates as the check for duplicates seems to be quite strict.

In my case, the "perfect" duplicate check would only check the amount posted to the checking account that I'm importing the transactions for. So it would be nice if the duplicate check could be configured (and if a "check only amount posted to a single account" would be shipped with Beancount).

I could implement this, if you agree that it would be a useful addition. I would probably add a duplicate_check(entry1, entry2) to importer.ImporterProtocol.

blais commented 6 years ago

Original comment by Adam Gibbins (Bitbucket: adamgibbins, GitHub: adamgibbins).


I'd certain gain value from this. The duplicate detection generates large amounts of false positives for me, I un-duplicate as part of my import process, so at minimum I'd just like to turn it off. I attach a lot of metadata to my transactions, I've a very good bank that gives me lots of info -- including their own unique transaction ID. So I've a absolute authoritative duplicate source, it'd be nice to utilise that.

blais commented 5 years ago

Original comment by Johannes Harms (Bitbucket: johannesjh, GitHub: johannesjh).


On a sidenote and as an update, because I happened to stumble over this issue. The smart importer project (where Jakob has actively been involved) already provides one implementation based on which you could develop custom duplicate checkers. See https://github.com/beancount/smart_importer/blob/master/smart_importer/detector.py

dnicolodi commented 1 year ago

I think that the current transaction comparison function used in deduplication covers this use case already, but it is diffiocult to tell without a concrete example. However, with the new Importer interface, the importers can define a cmp static method that is used to compare transactions for deduplication or can define a deduplicate() method to completely redefine the deduplication behavior.