freedomofpress / dangerzone

Take potentially dangerous PDFs, office documents, or images and convert them to safe PDFs
https://dangerzone.rocks/
GNU Affero General Public License v3.0
3.36k stars 153 forks source link

OSError: [Errno 39] Directory not empty: 'pixels' when aborting during doc to pixels stage #759

Open naglis opened 3 months ago

naglis commented 3 months ago

Steps to reproduce:

  1. Select an EPUB file in dangerzone 0.6.0 GUI (I've used one with ~1000 pages when in GUI mode to have more time to cancel it).
  2. Click "Convert to Safe Document".
  3. Select "Exit" from the menu while it is still in doc to pixels stage, click "Abort conversions" button in the dialog.
  4. A traceback is printed in the terminal:
    Traceback (most recent call last):
    File "/usr/lib/python3.11/weakref.py", line 666, in _exitfunc
    f()
    File "/usr/lib/python3.11/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3.11/tempfile.py", line 933, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
    File "/usr/lib/python3.11/tempfile.py", line 929, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
    File "/usr/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
    File "/usr/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
    File "/usr/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
    OSError: [Errno 39] Directory not empty: 'pixels'

    I am able to reproduce it pretty much every time both via installed package (Arch Linux) and when running from source via dev_scripts. I can also reproduce it if running via dangerzone-cli and then interrupting (Ctrl+C) the process while it is converting pages to pixels.

After dangerzone exits, the pixels subdirectory in the temporary directory created on the host by dangerzone will contain *.{rgb,width,height} files of the pages created after the "Abort conversions" button was pressed.

My speculation on what happens is that tempfile.TemporaryDirectory cleanup is invoked while pixel files are still being produced. The cleanup function uses shutil.rmtree which first deletes the files that exist in the pixels subdirectory and then tries to delete the subdirectory itself, however, since the doc-to-pixel conversion is still running on another thread, it creates new files in the pixels subdirectory in the meantime since rmtree last checked it is empty.

Perhaps some synchronization is needed between the threads after it is clear that dangerzone is exiting to ensure that new files are not being created () in the temporary directory and the TemporaryDirectory cleanup can run cleanly? :thinking:

apyrgio commented 3 months ago

Huh, very interesting. We are very close to ditching the temporary directory (see https://github.com/freedomofpress/dangerzone/pull/748) altogether, so this issue will be implicitly solved. Let's keep it open till then.