gilesknap / gphotos-sync

Google Photos and Albums backup with Google Photos Library API
Apache License 2.0
1.97k stars 161 forks source link

Deadlock in photo download flow. #480

Open peterjdolan opened 1 month ago

peterjdolan commented 1 month ago

When I run gphotos-sync, it reaches the "Done" state, but then hangs until I interrupt the process. At that point, the stack trace seems to indicate a threading deadlock, probably related to a future's exception not being handled appropriately. I may have time to take a look at the code to try to debug it, but not for some time.

Example console output:

05-17 15:02:05 WARNING  gphotos-sync 3.2.1 2024-05-17 15:02:03.747257 
05-17 15:02:07 WARNING  Indexing Google Photos Files ... 
05-17 15:02:08 WARNING  indexed 0 items 
05-17 15:02:34 WARNING  Downloading Photos ... 
05-17 15:11:55 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 510, in main
    self.start(args)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 450, in start
    self.do_sync(args)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 418, in do_sync
    files_downloaded = self.google_photos_down.download_photo_media()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/GooglePhotosDownload.py", line 178, in download_photo_media
    self.do_download_complete(futures_left)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/GooglePhotosDownload.py", line 326, in do_download_complete
    e = future.exception(timeout=timeout)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 494, in exception
    raise TimeoutError()
TimeoutError
05-17 15:11:55 WARNING  Done. 

^CException ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1553, in _shutdown
    atexit_call()
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/lib/python3.11/threading.py", line 1112, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:

Another example console output:

05-17 15:16:58 WARNING  gphotos-sync 3.2.1 2024-05-17 15:16:58.223474 
05-17 15:16:58 WARNING  Indexing Google Photos Files ... 
05-17 15:16:59 WARNING  indexed 0 items 
05-17 15:17:02 WARNING  Downloading Photos ... 
05-17 15:18:02 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 510, in main
    self.start(args)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 450, in start
    self.do_sync(args)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/__main__.py", line 418, in do_sync
    files_downloaded = self.google_photos_down.download_photo_media()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/GooglePhotosDownload.py", line 178, in download_photo_media
    self.do_download_complete(futures_left)
  File "/home/peter/.local/share/virtualenvs/gphotos-sync-QkASRHkU/lib/python3.11/site-packages/gphotos_sync/GooglePhotosDownload.py", line 326, in do_download_complete
    e = future.exception(timeout=timeout)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 494, in exception
    raise TimeoutError()
TimeoutError
05-17 15:18:02 WARNING  Done. 
^CException ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1553, in _shutdown
    atexit_call()
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/lib/python3.11/threading.py", line 1112, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt:
gilesknap commented 1 month ago

Hmmm this is an interesting one that I have not seen.

May be a little challenging for me to debug without being able to reproduce it myself, so if you can investigate that would be great.

Thanks for the report.

peterjdolan commented 1 week ago

FYI I've narrowed this down to how an exception in the file download flow is handled, which I think will be a terribly rare situation.

Specifically, the call to os.utime at https://github.com/gilesknap/gphotos-sync/blob/2c70af93aaadc815bee4ab1cf5dc848ebf4f4dd3/src/gphotos_sync/GooglePhotosDownload.py#L286 fails on my system because I'm running with a mounted SMB share, and the server doesn't allow the file attributes to be modified.

When an exception is thrown at that line, the download task times out (TimeoutException at https://github.com/gilesknap/gphotos-sync/blob/2c70af93aaadc815bee4ab1cf5dc848ebf4f4dd3/src/gphotos_sync/GooglePhotosDownload.py#L332), at which point the database connection is already closed, and the program reaches a deadlock state and fails to exit.

I haven't been able to write a unit test to expose this because my local development environment is a bit messy, but I'll keep working on it. In the meantime, I'll just change the mounted filesystem to make it work.

gilesknap commented 3 days ago

Thanks @peterjdolan I figure I just need to trap that exception - can't do much but ignore it I guess.

peterjdolan commented 2 days ago

Agreed. I'm honestly not sure why it isn't already being caught, but it's not.

When I added a second general "except Exception" clause, it worked as expected.

I'm also not sure why the existing unit tests don't cover this. On my reading, they should.