firefly-iii / firefly-iii

Firefly III: a personal finances manager
https://firefly-iii.org/
GNU Affero General Public License v3.0
16.22k stars 1.47k forks source link

Duplicate transactions during import, if the potential duplicates were imported before 4.7.11 #2116

Closed mikhail5555 closed 5 years ago

mikhail5555 commented 5 years ago

Bug description I am running Firefly III version 4.7.12, and my problem is:

Steps to reproduce Import two different CSV's with a couple overlapping entries with the following configuration (ING bank dutch)

    "file-type": "csv",
    "date-format": "Ymd",
    "has-headers": true,
    "delimiter": ",",
    "apply-rules": true,
    "specifics": {
        "IngDescription": 1
    },
    "import-account": 1,
    "column-count": 9,
    "column-roles": [
        "date-transaction",
        "opposing-name",
        "account-iban",
        "opposing-iban",
        "_ignore",
        "ing-debit-credit",
        "amount",
        "tags-comma",
        "description"
    ],
    "column-do-mapping": [
        false,
        true,
        true,
        true,
        false,
        false,
        false,
        false,
        false
    ],
    "column-mapping-config": {
        "2": [],
        "3": [],
        "1": []
    }
}

Expected behavior Duplicate entries get ignored and I get a warning instead of them being added to the transaction list.

Extra info `Debug information generated at 2019-02-23 17:53:46 Europe/Berlin for Firefly III version 4.7.12.

Variable Content
FF version 4.7.12
FF API version 0.9.2
App environment production
App debug mode ''
App cache driver file
App logging , stdout
PHP version 7.2.15
Display errors Off
Session start 2019-02-01 00:00:00
Session end 2019-02-28 23:59:59
Session first 2019-01-01 00:00:00
Error reporting ALL errors
Host Linux
Interface apache2handler
UserID 1
Attempt at "en" false
Attempt at "English" false
Attempt at "en_US.utf8" 'en_US.utf8'
Attempt at "en_US.UTF-8" 'en_US.UTF-8'
DB drivers mysql, pgsql, sqlite
Current driver mysql
Login provider
Storage disks local-upload
Using Sandstorm? no
Is Sandstorm (.env) false
Is Docker (.env) true
bunq uses sandbox false
Trusted proxies (.env) **
User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36
Loaded extensions Core, date, libxml, openssl, pcre, sqlite3, zlib, ctype, curl, dom, fileinfo, filter, ftp, hash, iconv, json, mbstring, SPL, PDO, session, posix, Reflection, standard, SimpleXML, pdo_sqlite, Phar, tokenizer, xml, xmlreader, xmlwriter, mysqlnd, apache2handler, bcmath, gd, intl, ldap, memcached, pdo_mysql, pdo_pgsql, sodium, zip, Zend OPcache `

Bonus points I can share the CSV's with you if that makes your life easier, but i verified that the data is 100% identical.

JC5 commented 5 years ago

Mmm. If the data is 100% identical it shouldn’t have been imported. If you could share both files (or the lines) that would be great. I am curious what happened.

mikhail5555 commented 5 years ago

Mmm. If the data is 100% identical it shouldn’t have been imported. If you could share both files (or the lines) that would be great. I am curious what happened.

Datum,Naam / Omschrijving,Rekening,Tegenrekening,Code,Af Bij,Bedrag (EUR),MutatieSoort,Mededelingen
20190125,K. den Boon,NL92INGB0000000000,NL10RABO0000000001,OV,Bij,"58,00",Overschrijving,Naam: K. den Boon IBAN: NL10RABO0000000001 Valutadatum: 25-01-2019

It were multiple transactions, but this was one of the lines (IBAN's edited because of obvious reasons)

Entries from transaction_journal
181 2019-01-26 23:58:38 2019-01-27 00:16:02     1   2       1   Naam: K. den Boon Valutadatum: 25-01-2019   2019-01-25 00:00:00             4   0   1   1
371 2019-02-23 17:44:26 2019-02-23 17:48:22 2019-02-23 17:48:22 1   2       1   Naam: K. den Boon Valutadatum: 25-01-2019   2019-01-25 00:00:00             0   0   1   1

Entries from journal_meta
357 2019-01-26 23:58:38 2019-01-26 23:58:38 181 importHashV2    "777cebdf61be26c3a0c952aff8448e0fae94c4c4e247132a3a98bb5904133bd1"  61abb920364ff6aab835af79e698bf3e92c30d169981175c78d7247b3abab5d7    
358 2019-01-26 23:58:38 2019-01-26 23:58:38 181 original-source "csv-import-v4.7.9" 1e93a4d575c6d23267bc8c113c9d3921b3ef1f32fbb4bcd3286c7954e13cd5cc    

724 2019-02-23 17:44:26 2019-02-23 17:48:22 371 importHashV2    "84b59c8496aa45b83246a49275f494aad544ddc8fdbbc8be1ac5d27a5ad62363"  cb75d184429ddc11f07efb360fa5bff29b959a5545700915e1862aa74cf3e1e8    2019-02-23 17:48:22
725 2019-02-23 17:44:26 2019-02-23 17:48:22 371 original-source "csv-import-v4.7.12"    350800f09e3a5a23de31aedb6ffe9c05fe6c9ca6e7a39984d707bd0e1a69bbc3    2019-02-23 17:48:22

Could it be that the ROW of the file could be affecting the hash? or the version change from 4.7.9 -> 4.7.12 have caused it?

JC5 commented 5 years ago

Like I said, if you could share both lines that would be great. An uncensored version over email would be perfect, especially when this results in the same problem.

If I can't replicate it there’s nothing I can do for you.

mikhail5555 commented 5 years ago

Where can I find your email/what email should I use? Is it the one on the contact page?

JC5 commented 5 years ago

Yes, thegrumpydictator@gmail.com

mikhail5555 commented 5 years ago

I send you both uncensored files, let me know if there is any other information i can provide you with.

JC5 commented 5 years ago

I found the cause, and it's a one-time change I can't do anything about I'm afraid. Firefly III moved from date-specific transactions to datetime-specific transactions, meaning you can now set the time as well as the date for a transaction (at least, in the API, not in the user interface yet).

As a result, new transactions are imported with this datetime value: 2018-02-24 00:00:00 while older versions would use 2018-02-24. This causes the difference that Firefly III sees and makes it incapable of weeding out duplicates.

So this is a one-time thing that will happen when you re-import transactions that were imported in versions before 4.7.11. Sorry about that, there's not much I can do.

mikhail5555 commented 5 years ago

Ah that does make sense. You could maybe add a legacy support option? That you compare both hashes (one with only date and one with datetime) with the old transactions since probably a lot of people rely on the duplicate detection. (And maybe even go as far as updating the old hash) But I guess I wont import anything before 24-01 anymore. Thanks for taking your time to figure it out!

JC5 commented 5 years ago

These problems usually solve themselves without adding more buttons. I’ve deleted your files, thanks for sharing them with me. :+1:

mikhail5555 commented 5 years ago

Also a fair point, could you maybe change the title to "... during import after update to 4.7.11". Thanks once again for quickly finding the bug :)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.