Closed yanokwa closed 2 years ago
For detecting zip only, there's a built-in library method zipfile.is_zipfile
(source here).
@lindsay-stevens Thanks for the tip!
I just tried for a bit and I'm having trouble getting it to work cleanly. Seems I'd either have to write the incoming file to disk, then rename based on is_zip_file.
My current approach works and doesn't feel horrible, so I'd rather not spend more time getting is_zip_file working. I'll give it one more go after I have some caffeine.
is_zipfile
is too clever in how it detects zip files and so it won't work for us.
zipfile.is_zipfile('example.xls') # False
zipfile.is_zipfile('example.xlsx') # False
zipfile.is_zipfile('example.xlsx.zip') # True
zipfile.is_zipfile('example.xlsx.zip.foo') # True
Fixes #29
There seems to be two popular ways to detect filetypes from content in Python: python-magic and filetype. The former depends on the libmagic C library and the latter is pure Python.
I didn't want to add a dependency and it seemed (I only confirmed with filetype) that neither solution could detect XLS, but rather could only detect a ZIP. Given that it's straightforward to detect a ZIP, I pulled the code from filetype and made it into a small method.
I added pyxform-clean.xls to the test suite and made sure that worked. Also tried the various test forms.