BelgianBiodiversityPlatform / python-dwca-reader

🐍 A Python package to read Darwin Core Archive (DwC-A) files.
BSD 3-Clause "New" or "Revised" License
45 stars 21 forks source link

`.close()` errors do not work on non-MS operating system #102

Closed csbrown closed 10 months ago

csbrown commented 10 months ago

Description

I got an error about an error when I attempted to .close(). I'm on a Mac.

Possibly this could just be any sort of Exception at all.

File ~/USGS/edrr/edrr-data-scraping/src/dwca/venv/lib/python3.11/site-packages/dwca/helpers.py:9, in remove_tree(path, retries, sleep)
      7 try:
      8     rmtree(path, False)
----> 9 except WindowsError:
     10     time.sleep(sleep)
     11 else:

NameError: name 'WindowsError' is not defined
niconoe commented 10 months ago

Thanks. Weird, I'm on a Mac myself and never got the error (but I never close manually, I use the with statement). Do you have a minimal example of code that triggers the issue so I can properly investigate? Thanks!

csbrown commented 10 months ago

I've been experimenting with this stuff in a jupyter notebook. The overhead of unzipping and cleaning up a 3GB Taxon file is heavy, so I was using open/close to keep the files around. Stereotypically, I was bad about .close()ing when I needed to, so I had lots of giant tmp folders lying around. I decided to clean them up manually, so I went and deleted them all. One of them was still being used, so when I .close()ed it, this code branch was called. Seems like a good idea to have error-handling code here, but it didn't handle the error. :)

To reproduce:

dwca = DwCAReader("my_dwca.zip")
# find out where your temp dirs are:
print(tempfile.mkdtemp())
# now, go delete the tmpXXXXX directories you've created.  One should have DwCA stuff in it.
dwca.close()
# The except clause itself throws an error.

More-or-less.

csbrown commented 10 months ago

This error probably won't occur during any usual workflow we adopt. So, it's not a big deal. Figured I'd document it at least though.

niconoe commented 10 months ago

So if I understand it correctly, you get the error only because the temporary files were cleaned up manually before calling close?

If that's the case, I am a bit reluctant to add code for such a specific use-case. I am also wondering how pertinent it would be to hide a "reasonable error" in the cleanup routine (I'd like the users to be informed if the cleanup failed). But on the other side, I agree the exact error is not the correct one because we try to import a non available exception on the platform.

I am therefore thinking of catching OSError instead (parent of WindowsError), but importable everywhere. It's a bit less specific obviously, but I think I can live with that :).

Any opinion?

csbrown commented 10 months ago

Yeah, I think swapping WindowsError with OSError is a fine idea.

niconoe commented 10 months ago

Done, thanks for the feedback!