Closed eumiro closed 3 years ago
This is interesting! macos-latest
fails exactly at the same place as an Arch-based Linux distro (more details below the error message):
=================================== FAILURES ===================================
_____________________ GuesserTest.test_30_zip_single_file ______________________
self = <tests.test_30_decompressors.GuesserTest testMethod=test_30_zip_single_file>
def test_30_zip_single_file(self):
uncompressed = BytesIO(b"Hello World\n")
uncompressed.name = 'test_file'
raw = BytesIO()
raw.name = "test_file.zip"
zip = zipfile.ZipFile(raw, 'w')
try:
zip.writestr("test_file", uncompressed.getvalue())
finally:
zip.close()
raw.seek(0)
self._check_decompressor(
destream.decompressors.Unzip,
> raw, uncompressed)
tests/test_30_decompressors.py:301:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_30_decompressors.py:31: in _check_decompressor
decompressor._guess(mime, str(archive.realname), compressed_fileobj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'destream.decompressors.zip.Unzip'>
mime = 'application/octet-stream', name = 'test_file.zip'
fileobj = <_io.BytesIO object at 0x110e6c350>
@classmethod
def _guess(cls, mime, name, fileobj):
if getattr(cls, '_unique_instance', False):
if cls in fileobj._decompressors:
raise ValueError("class %s already in the decompressor list")
realname = name
if hasattr(cls, '_mimes'):
match = RE_EXTENSION.search(name)
if hasattr(cls, '_extensions') and \
match.group(2) and \
os.path.normcase(match.group(3)) in cls._extensions:
realname = match.group(1)
if mime not in cls._mimes:
raise ValueError(
(cls, mime, name, fileobj),
> "can not decompress fileobj using class %s" % cls.__name__)
E ValueError: ((<class 'destream.decompressors.zip.Unzip'>, 'application/octet-stream', 'test_file.zip', <_io.BytesIO object at 0x110e6c350>), 'can not decompress fileobj using class Unzip')
destream/archive.py:65: ValueError
It says it cannot find application/octet-stream
in the list of ['application/zip']
provided by the Unzip
class. In Ubuntu this does not fail, so there might be some difference in libmagic
.
The difference between ubuntu/debian and arch/macos is in the packaging of the libmagic.so
file:
/usr/lib/x86_64-linux-gnu/libmagic.so.1
usr/lib/libmagic.so
Anyone has an idea?
I have:
lrwxrwxrwx 1 root root 17 Jun 16 2020 /usr/lib/libmagic.so -> libmagic.so.1.0.0
lrwxrwxrwx 1 root root 17 Jun 16 2020 /usr/lib/libmagic.so.1 -> libmagic.so.1.0.0
-rwxr-xr-x 1 root root 162K Jun 16 2020 /usr/lib/libmagic.so.1.0.0
So it seems equivalent.
file
works so it's not the library.
I don't see anything wrong in the test of library. The ArchiveFile.peek()
(line 30) returns the 128B that are the entire file.
From python-magic: https://github.com/ahupp/python-magic/issues/166
But I could not reproduce on Arch 2021-01-11.
@jruere what do you get in the following code?
❯ python
>>> import zipfile, magic
>>> z = zipfile.ZipFile('a.zip', 'w')
>>> z.writestr("a.txt", "hello world\n")
>>> z.close()
>>> magic.from_buffer(open("a.zip", 'rb').read(), mime=True)
'application/octet-stream'
>>>
❯ file --mime-type a.zip
a.zip: application/octet-stream
But:
❯ echo "hello world" > b.txt
❯ zip b.zip b.txt
adding: b.txt (stored 0%)
❯ file --mime-type b.zip
b.zip: application/zip
and then:
❯ file a.zip b.zip
a.zip: Zip archive data, made by v2.0 UNIX, extract using at least v2.0, last modified Sun Sep 8 17:24:03 2013, uncompressed size 12, method=store
b.zip: Zip archive data, at least v1.0 to extract
I can reproduce the problem with the procedure you gave.
Finally,
$ file a.zip b.zip
a.zip: Zip archive data, made by v2.0 UNIX, extract using at least v2.0, last modified Sun Sep 8 15:13:14 2013, uncompressed size 12, method=store
b.zip: Zip archive data, at least v1.0 to extract
This looks like a but in libmagic...
I don't have a machine running on OSX so I can't really help but I remember I had numerous issues with the difference of versions of libmagic. Some detected mime types properly, some didn't, this is the reason why I was updating file
and libmagic
on the CI.
It is very likely that file
is a UNIX command and therefore the OSX version of it has a completely different implementation than the GNU version for Linux. This is the case for other commands like sed
and tar
which are not 100% compatible between OSX and Linux.
I don't really have a solution for this. You might want to tell the user to install the GNU version on OSX and say "this is the only version we support officially".
One other alternative would be to implement your own MIME detection mechanism but this might bring other problems. For example, ZIP files always start with the bytes "PK", that one is easy to identify.
Can you share a.zip? What it looks like is that there's a magic database entry for this second format, but the mimetype wasn't setup properly.
Can you share a.zip? What it looks like is that there's a magic database entry for this second format, but the mimetype wasn't setup properly.
With the following script in Python 3.9:
import zipfile
with zipfile.ZipFile('a.zip', 'w') as zf:
zf.writestr('a.txt', 'hello world\n')
I get a 120 Bytes large file a.zip
:
❯ file --mime-type a.zip
a.zip: application/octet-stream
And this is its base64 version:
UEsDBBQAAAAAAE2dLVItOwivDAAAAAwAAAAFAAAAYS50eHRoZWxsbyB3b3JsZApQSwECFAMUAAAA
AABNnS1SLTsIrwwAAAAMAAAABQAAAAAAAAAAAAAAgAEAAAAAYS50eHRQSwUGAAAAAAEAAQAzAAAA
LwAAAAAA
the md5sum of the file is 64561ffd00255a30ffaa38acc9867eed
Thank you for looking at the problem!
Looks like a regression in libmagic 5.39:
% docker run -it archlinux:latest /bin/bash
[root@a6dd08f21d72 /]# file --version
file-5.39
magic file from /usr/share/file/misc/magic
seccomp support included
[root@a6dd08f21d72 /]# file --mime-type a.zip
a.zip: application/octet-stream
vs
% docker run -it archlinux:20200505 /bin/bash
[root@e8f497cea4ca /]# file --version
file-5.38
magic file from /usr/share/file/misc/magic
seccomp support included
[root@e8f497cea4ca /]# file --mime-type a.zip
a.zip: application/zip
Thank you, @ahupp! That explains why it works on Ubuntu 20.04 (libmagic1 5.38) and not on arch-based distro (file 5.39). What can we do about it?
I'm reporting a bug upstream, but for now, I don't know if there's anything you can do about it.
@eumiro you can add a note to the troubleshooting section 😁 https://github.com/destream-py/destream#troubleshooting
Now we'll need to find a way to brew install
a specific version (5.38) of libmagic. I cannot test it and searching online points me to some git checkout
hacks. Any idea?
I think we can introduce the macos
CI because #27 will then correctly xfail the test.
Now we'll need to find a way to
brew install
a specific version (5.38) of libmagic. I cannot test it and searching online points me to somegit checkout
hacks. Any idea?
Did you see this? https://stackoverflow.com/questions/3987683/homebrew-install-specific-version-of-formula
Maybe try brew install libmagic@5.38
?
Apparently you can do brew versions libmagic
to see what is available.
I think we can introduce the
macos
CI because #27 will then correctly xfail the test.
Very good point! haha That sounds like a perfectly acceptable solution for me.
(I wouldn't suggest to make a Windows CI check just yet because I suspect the test code will fail to even execute properly.)
Maybe try
brew install libmagic@5.38
?
That's it, thanks.
I will now pin both linux and macos to libmagic version 5.38. As soon as there's a problem installing it or we get a new fixed version of libmagic, we can deal with it again.
I feel trapped every time I click on the link "View it on GitHub" because I get that page that says there is nothing to see :confounded: https://twitter.com/CecileTonglet/status/1348595584136077314
I feel trapped every time I click on the link "View it on GitHub" because I get that page that says there is nothing to see confounded https://twitter.com/CecileTonglet/status/1348595584136077314
I am sorry. But who wrote about quick dirty commits that get cleaned up afterwards? :thinking:
I am sorry. But who wrote about quick dirty commits that get cleaned up afterwards? :thinking:
:innocent:
Now my PRs look more like this: https://github.com/IMI-eRnD-Be/wasm-run/pull/28 You can read the intermediary commits easily but I squash-merged at the end to https://github.com/IMI-eRnD-Be/wasm-run/commit/fffb646d8858bb1d39445e11dac19c2d55292580
Disclaimer: I have no experience with macos, so just trying…