arsenetar / dupeguru

Find duplicate files
https://dupeguru.voltaicideas.net
GNU General Public License v3.0
5.25k stars 412 forks source link

Problems purging the cache (purge_outdated)? #439

Closed interplanetarychris closed 5 years ago

interplanetarychris commented 7 years ago

I'm doing a very large Picture/Content dupe check on repositories that include Lightroom and Aperture directories on local and AFP filesystems.

The file at the end was not included in the current search, but apparently there was a problem in purging the cache.

Application Identifier: com.hardcoded-software.dupeguru Application Version: 4.0.3 Mac OS X Version: Version 10.12.6 (Build 16G29)

Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 111, in getitem KeyError: 'path:/Volumes/storage/Aperture Libraries/Genealogy Library (active) 3.aplibrary/Masters/2010/01/06/20100106-121355/00774_n_9aek3kmar0118.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/cocoa/inter.py", line 259, in pulse File "build/dupeGuru.app/Contents/Resources/py/hscommon/gui/progress_window.py", line 101, in pulse File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 323, in _job_error File "build/dupeGuru.app/Contents/Resources/py/hscommon/jobprogress/performer.py", line 43, in _async_run File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 780, in do File "build/dupeGuru.app/Contents/Resources/py/core/scanner.py", line 137, in get_dupe_groups File "build/dupeGuru.app/Contents/Resources/py/core/pe/scanner.py", line 31, in _getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 167, in getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 65, in prepare_pictures File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 121, in purge_outdated File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 113, in getitem KeyError: b'path:/Volumes/storage/Aperture Libraries/Genealogy Library (active) 3.aplibrary/Masters/2010/01/06/20100106-121355/00774_n_9aek3kmar0118.jpg'

silentnyte commented 7 years ago

I think that you are on to the problem. I changed my search path trying to narrow the problem down. Originally I was searching /Volumes/Local01/Pictures/2008 as a test which worked. Then I moved on to a larger directory /Volumes/Local01/Pictures/2015. That is when I go this error. That will not correct. Notice that the path is the first path.

Deleting ~/Library/Application Support/dupeGuru/cashed_pictures.shelve.db fixed this.

Application Identifier: com.hardcoded-software.dupeguru Application Version: 4.0.3 Mac OS X Version: Version 10.12.4 (Build 16E195)

Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 111, in getitem KeyError: 'path:/Volumes/Local01/Pictures/2008/07/20080706_124659-5.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/cocoa/inter.py", line 259, in pulse File "build/dupeGuru.app/Contents/Resources/py/hscommon/gui/progress_window.py", line 101, in pulse File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 323, in _job_error File "build/dupeGuru.app/Contents/Resources/py/hscommon/jobprogress/performer.py", line 43, in _async_run File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 780, in do File "build/dupeGuru.app/Contents/Resources/py/core/scanner.py", line 137, in get_dupe_groups File "build/dupeGuru.app/Contents/Resources/py/core/pe/scanner.py", line 31, in _getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 167, in getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 65, in prepare_pictures File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 121, in purge_outdated File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 113, in getitem KeyError: b'path:/Volumes/Local01/Pictures/2008/07/20080706_124659-5.jpg'

ghost commented 6 years ago

I tried again today to reproduce the error, to no avail. I'm thinking that the issue has to do with hash collision in the cache and that to reproduce this, a large number of photos are needed. I don't have a large photo collection so I can't reproduce.

As I write in #402, I really don't like the idea of fixing a problem blindly, but then again, because many users have a large photo collection (why would you use dupeGuru otherwise?), I'm going to do it. @silentnyte @interplanetarychris would you be willing to confirm or infirm the fix if I created a test build?

interplanetarychris commented 6 years ago

Certainly - happy to help debug and make the tool more useful. My image counts are often between 5,000 and 40,000.

-Chris

On Sep 17, 2017, at 9:12 PM, Virgil Dupras <notifications@github.com mailto:notifications@github.com> wrote:

I tried again today to reproduce the error, to no avail. I'm thinking that the issue has to do with hash collision in the cache and that to reproduce this, a large number of photos are needed. I don't have a large photo collection so I can't reproduce.

As I write in #402 https://github.com/hsoft/dupeguru/issues/402, I really don't like the idea of fixing a problem blindly, but then again, because many users have a large photo collection (why would you use dupeGuru otherwise?), I'm going to do it. @silentnyte https://github.com/silentnyte @interplanetarychris https://github.com/interplanetarychris would you be willing to confirm or infirm the fix if I created a test build?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hsoft/dupeguru/issues/439#issuecomment-330074772, or mute the thread https://github.com/notifications/unsubscribe-auth/AGY65ICu_aWZYs60eklwZjmVEIqRHciKks5sjW8ygaJpZM4PI04G.

https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png https://github.com/hsoft/dupeguru https://github.com/hsoft/dupeguru/issues/439#issuecomment-330074772

silentnyte commented 6 years ago

I don't mind helping out.


From: interplanetarychris notifications@github.com Sent: Sunday, September 17, 2017 6:34:58 PM To: hsoft/dupeguru Cc: SilentNyte; Mention Subject: Re: [hsoft/dupeguru] Problems purging the cache (purge_outdated)? (#439)

Certainly - happy to help debug and make the tool more useful. My image counts are often between 5,000 and 40,000.

-Chris

On Sep 17, 2017, at 9:12 PM, Virgil Dupras <notifications@github.com mailto:notifications@github.com> wrote:

I tried again today to reproduce the error, to no avail. I'm thinking that the issue has to do with hash collision in the cache and that to reproduce this, a large number of photos are needed. I don't have a large photo collection so I can't reproduce.

As I write in #402 https://github.com/hsoft/dupeguru/issues/402, I really don't like the idea of fixing a problem blindly, but then again, because many users have a large photo collection (why would you use dupeGuru otherwise?), I'm going to do it. @silentnyte https://github.com/silentnyte @interplanetarychris https://github.com/interplanetarychris would you be willing to confirm or infirm the fix if I created a test build?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hsoft/dupeguru/issues/439#issuecomment-330074772, or mute the thread https://github.com/notifications/unsubscribe-auth/AGY65ICu_aWZYs60eklwZjmVEIqRHciKks5sjW8ygaJpZM4PI04G.

https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png https://github.com/hsoft/dupeguru https://github.com/hsoft/dupeguru/issues/439#issuecomment-330074772

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhsoft%2Fdupeguru%2Fissues%2F439%23issuecomment-330098901&data=02%7C01%7Csilentnyte%40msn.com%7C2592d2d2db474edc50c808d4fe1c58e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636412845075601909&sdata=0vOBLOzwVD7D%2B0e3LiWjHVh0O4Xn1qNKOB13vC0mvls%3D&reserved=0, or mute the threadhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACUVYyiCz_XWdPWPQUme2uK3SCdkdxyJks5sjZ6SgaJpZM4PI04G&data=02%7C01%7Csilentnyte%40msn.com%7C2592d2d2db474edc50c808d4fe1c58e3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636412845075601909&sdata=l8lj%2F0COYJzZZS3FcPARDaCssrf%2FBB964evMpZ0h6c8%3D&reserved=0.

ghost commented 6 years ago

https://download.hardcoded.net/dupeguru_osx_4_0_3_shelvetest.dmg

This test version is the same as v4.0.3, but with the addition of the commit referenced above. @interplanetarychris @silentnyte Could you confirm that it works properly in situation where the vanilla v4.0.3 failed?

interplanetarychris commented 6 years ago

I've since been able to try the shelve version in similar scenarios as the prior failures. I have added and removed folders from the scan list to engage the additions and subtractions to the cache. I have yet to have it crash yet with file/image repositories ranging from a few thousand to about 35K. Thanks for the fix!

fuzzy76 commented 4 years ago

I got this error on 4.0.3, upgraded to 4.0.4, got the same error, deleted the cache manually, reran, and now it seems to be working. Does that mean the bug is gone from 4.0.4 or not?

fuzzy76 commented 4 years ago
Application Identifier: com.hardcoded-software.dupeguru
Application Version: 4.0.4
Mac OS X Version: Version 10.15.4 (Build 19E266)

Traceback (most recent call last):
  File "build/py/shelve.py", line 111, in __getitem__
KeyError: 'path:/Volumes/bertha/!Sorteres/39.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "build/py/cocoa/inter.py", line 259, in pulse
  File "build/py/hscommon/gui/progress_window.py", line 101, in pulse
  File "build/py/core/app.py", line 323, in _job_error
  File "build/py/hscommon/jobprogress/performer.py", line 43, in _async_run
  File "build/py/core/app.py", line 780, in do
  File "build/py/core/scanner.py", line 137, in get_dupe_groups
  File "build/py/core/pe/scanner.py", line 31, in _getmatches
  File "build/py/core/pe/matchblock.py", line 167, in getmatches
  File "build/py/core/pe/matchblock.py", line 65, in prepare_pictures
  File "build/py/core/pe/cache_shelve.py", line 121, in purge_outdated
  File "build/py/shelve.py", line 113, in __getitem__
KeyError: b'path:/Volumes/bertha/!Sorteres/39.jpg'

Parts of path have been cut from the message. This happens when scanning a collection of 200.000 images.

fuzzy76 commented 4 years ago

Weird. I tried re-run dupeGuru on another folder, and got the same crash referencing the old folder (which was not part of the scan) again...