arsenetar / dupeguru

Find duplicate files
https://dupeguru.voltaicideas.net
GNU General Public License v3.0
5.43k stars 414 forks source link

Not finding all duplicates. #1204

Open pantropia opened 8 months ago

pantropia commented 8 months ago

Describe the bug Some exact duplicates not being found in Standard mode

To Reproduce 1) Use BeyondCompare to compare two folders and ensure that there are no files in folder A which do not exist in folder B and no content mismatches 2) Open DupeGuru and set folder A as Normal, folder B as Reference 3) Scan, remove all duplicates found 4) Clear cache, scan again - no duplicates found 5) Refresh BeyondCompare - find that many duplicates do still exist

Expected behavior All exact duplicates should be found. Thousands of others were, so what's up with these ones? They definitely are exact duplicates because I only just made them.

Screenshots Screenshot 2024-02-27 082632

Here's a screenshot of the duplicates as shown in BeyondCompare after the scan.

Desktop (please complete the following information):

Additional context In this case I'm scanning .eml files. I noticed an old copy of DupeGuru lurking in a backup of an old desktop and to my surprise it ran when I tried it v3.something. Same result - though I note that it picked up my settings, so I don't know if perhaps it might have been giving me the v3 interface but using the backend from 4.3.1 I thought maybe if there were duplicates in the references folder it might cause an issue, so I cleared the cache, set it as Normal, deduped it, cleared the cache and retried to make sure there were no duplicates showing then cleared the cache again, set it back to Reference and ran the deduplication against the other folder again - nothing found. I looked at BeyondCompare again - over 600 exact duplicates still. I cleared the cache again, ran it once more with both folders set to Normal, and it found several duplicates - less than a screenfull - all in the folder which was originally the reference. Only one had a check-box against it. So I closed the app, reopened it, ran the dedupe again on those two folders... and it found over 15000 duplicates, all in the folder which was originally the reference. I had not marked anything to be removed from results previously. Now, it's finding no duplicates whatsoever - There are 613 exact duplicates according to BeyondCompare, which made the original copy. image

pantropia commented 8 months ago

OK - Path/filename length may be relevant: I moved folders to the root of the drives, renamed folders to be shorter etc. It looks like the max path length is 260 characters.

arsenetar commented 8 months ago

This Microsoft doc and instructions towards the end should resolve this issue for you https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry based on the information you have provided.

pantropia commented 8 months ago

Aaaah of course. Windows 11 both supporting and not supporting something at the same time. Makes perfect sense. I was very not looking forward to trying to reduce all the paths over 259 characters as there are rather a lot of them. I made the registry change and it found 22 more results, but couldn't send them to the recycle bin. The error came up quite small, something about the data being sent being too small. I thought I'd screenshotted it, but I don't see it on my system. The link mentions changing the app manifest - is that relevant to dupeguru and if so, where do I find it?

arsenetar commented 8 months ago

That error for the recycle operation seems odd, if you get it again, can you capture it?

pantropia commented 8 months ago

If it comes up again, yes. At the moment I'm just running some scans on another folder, but I can redo the export to folder A (which I'd deleted as it was finally empty) to do some tests on.

pantropia commented 8 months ago

I haven't seen that error again yet but the last scan I ran found a couple of screenfuls of items, which I sent to the recycle bin. When I opened the bin to clear it out, none of those items were in there, just a couple of things I'd deleted manually yesterday. It does seem to be the case that after running a few scans in a row without closing and reopening the app (even if I clear the cache) it will do Something Weird - but it's been something different each time.

pantropia commented 8 months ago

Screenshot 2024-03-03 071939 I couldn't see anything obvious in the debug log. This was after I'd given up on deleting all of the results at once, saved the result set, restarted the app, reloaded, and tried deleting I think about 1/10th of the list (it's deduping two exports of all my mail from two different clients which use different folder structures.)

pantropia commented 8 months ago

image This is all the errors I ended up with after deleting in batches.

pantropia commented 8 months ago

I've had several out-of-memory crashes as well, mostly after several hours at the 'fiddling with results' stage, and when closed after such a crash, the app is still showing in task manager until I end it. It looks like what it's having a problem with is a folder which has a lot of zero-size files in it. I've been able to do the subfolders and each one has had more than 15000 of those zero-size files, so I can see why doing Many such folders would be an issue.