gaynetdinov / ex_money

An [abandoned] self-hosted personal finance app
ISC License
167 stars 12 forks source link

New way to detect duplicate transaction #61

Closed gaynetdinov closed 8 years ago

gaynetdinov commented 8 years ago

If there are 2 transaction with exact same amount made on the same day from the same payee — they are duplicates.

idanci commented 8 years ago

What about these two?

dupes

MTB INSTITUTION is payee in this case. There a lot of other examples so the logic can be erroneous.

It's better to report duplicates, the issue should be fixed on SaltEdge's side. There is a special route for that and a note in guides

gaynetdinov commented 8 years ago

I'm glad you're still following ex_money :)

As I can see these two transactions have the same day, the same amount and the same payee, so the match with the new logic. Am I missing something?

I guess instead of discarding a duplicate transaction detected by the new logic I should store it with a special flag and then should show to a user, so a user can confirm that this is a duplicate or not. When a user confirms that detected duplicate is a real duplicate, I can send a request to saltedge to mark this transaction as duplicate.

idanci commented 8 years ago

Emm, I was on my thoughts, didn't get the PR's point. I was thinking that your are going to not fetch the transaction from SaltEdge. Marking it as a duplicate and allowing user to decide is a perfect thing to do. :)

Just one more advice, do not include payee in your algorithm, most of the providers don't show payee at all. Relying on date, amount and description is more than enough.

gaynetdinov commented 8 years ago

Thanks for the advice. I had to rely on payee exactly because description of the duplicate transactions are different. They might be different in a couple of symbols or description of the duplicate transaction might include some more info(additional digits or codes).

Recently I synced transactions with fetch_all: true to finally import transactions from the last year and got tons of duplicated transactions and all duplicates I checked had payee field and description field was a bit different.

Probably in case of empty payee I should come up with a more sophisticated way of detecting duplicates. Like calculating difference between description using Levenstein algorithm :)

idanci commented 8 years ago

Don't think it would work for automatic logins. Transactions are taken from the banks, it creates space only for 2 types of duplicates:

  1. Our engine doesn't work as expected after bank interface change or it encounters previously unseen data and tries to work with it as usual.
  2. Bank changes a transaction's description for the next day, appends additional data in description or applies some fees.

A couple of different symbols sounds strange 💭

gaynetdinov commented 8 years ago

Actually, this way of detecting duplicate transactions is too naive. It's better to allow users to mark duplicate transactions, send a request to Saltedge and exclude such transactions from anywhere.

gaynetdinov commented 8 years ago

And I'm already doing that https://github.com/gaynetdinov/ex_money/blob/master/web/controllers/mobile/transaction_controller.ex#L136-L147 :)

idanci commented 8 years ago

Wise decision :)

gaynetdinov commented 8 years ago

@idanci check it out :) pay attention to inserted_at column

ex-money::DATABASE=> select saltedge_transaction_id, inserted_at, description, amount from transactions where made_on = '2016-09-01' order by saltedge_transaction_id;
saltedge_transaction_id |     inserted_at     |                                           description                                            | amount
-------------------------+---------------------+--------------------------------------------------------------------------------------------------+---------
                15376552 | 2016-09-01 06:02:28 | ABSCHLAG Strom 09/16 V K: 122683582gruenstrom easy 90                                            |   -40.0
                15384239 | 2016-09-01 13:03:48 | NORDRING APOTHEKE//BER LIN/DE 2016-08-31T08:49:31 Folgenr.002 Verfalld.2019-1 2                  |  -23.94
                15384240 | 2016-09-01 13:03:48 | CA//Berlin Prenzlauer Berg/DE 2016-08-31T18:35:56 Folgenr.002 Verfalld.2019- 12                  |   -19.0
                15584845 | 2016-09-06 10:02:50 | ABSCHLAG Strom 09/16 V K. 122683582gruenstrom easy 90                                            |   -40.0
                15584846 | 2016-09-06 10:02:50 | NORDRING APOTHEKE//BER LIN/DE 2016-08-31T08.49.31 Folgenr.002 Verfalld.2019-1 2                  |  -23.94
                15584847 | 2016-09-06 10:02:50 | CA//Berlin Prenzlauer Berg/DE 2016-08-31T18.35.56 Folgenr.002 Verfalld.2019- 12                  |   -19.0

So today I got 3 duplicates: 3 transactions were synced after 5 days they were actually made with different saltedge_transaction_id. The only difference is in descriptions: some colons got replaced with dots. Do you have any idea why it happens?

idanci commented 8 years ago

Emm, looking on the HBCI responses, it just returned different data on different days:

    {
      "transaction_id": "",
      "local_bankcode": "filtered",
      "local_account": "filtered",
      "local_name": "filtered",
      "remote_bankcode": "filtered",
      "remote_account": "filtered",
      "remote_name": "NORDRING APOTHEKE",
      "date": "2016-09-01",
      "value_date": "2016-09-01",
      "amount": -23.94,
      "currency": "EUR",
      "purpose": "NORDRING APOTHEKE//BER LIN/DE 2016-08-31T08:49:31  Folgenr.002 Verfalld.2019-1 2",
      "category": ""
    },
    {
      "transaction_id": "",
      "local_bankcode": "filtered",
      "local_account": "filtered",
      "local_name": "filtered",
      "remote_bankcode": "filtered",
      "remote_account": "filtered",
      "remote_name": "NORDRING APOTHEKE",
      "date": "2016-09-01",
      "value_date": "2016-09-01",
      "amount": -23.94,
      "currency": "EUR",
      "purpose": "NORDRING APOTHEKE//BER LIN/DE 2016-08-31T08.49.31  Folgenr.002 Verfalld.2019-1 2",
      "category": ""
    },

You are right, the difference is in colons/dots. I think we should find a common standard and replace colons with dots or vice-versa in transactions with similar templates. I'll create an issue in our tracker.

Thanks for reporting it!