Closed cclauss closed 3 months ago
I've just had a look at the code and really need to spend some more time on it. These may seem obvious questions but I'm lacking caffeine and sleep (long days this week) so they are just to help me understand a little better:
424d
will work but BM
would not if not first converted to hex. I can see some improvements for your test strings for PBM, PGM and PPM so I'll add those to my PR later, but that should not affect the basic matches PureMagic already has. UPDATE These improvements are now in my PR. We can detect a SPACE, TAB or Win/*nix newline, additionally if they are immediately followed by a # that will improve confidence as well.
Hi @cclauss, you might need to remove the :
from Closes:
to make GitHub link this PR to your issue.
Watching the commits with interest 🙂
That Closes
, Fixes
, Resolves
thing only works on the default branch.
Ah! Did not notice you had the PR against the develop branch, my bad. 🙂
There are certain cases where imghdr.what(file, h)
and puremagic.what(file, h)
disagree. My hunch is that puremagic
is right and these incompatibilities are part of why the Python core team wants to remove imghdr
.
Do you have suggestions for how we can always deliver identical results as imghdr.what(file, h)
?
The current definition in this PR is:
def what(file: os.PathLike | str | None], h: str | bytes | None) -> str:
but I think we should make two changes to deliver optional bug-for-bug compatibility:
h: str | bytes | None
--> h: bytes | None
because imghdr.what()
does not accept h = "str"strict_compat: bool = True
option to deliver bug-for-bug results by default.
def what(file: os.PathLike | str | None], h: bytes | None, strict_compat: bool = True) -> str:
if strict_compat and answer := imghdr_compatibility(h):
return answer
Thoughts on naming? strict_compat
, strict_imghdr
, imghdr_compat
, bug_for_bug
, others?
The imghdr_compatibility()
(name is also up for debate) function would deal with cases like this PR's test_what_from_string_todo()
function.
if not strict_compat:
then we would return puremagic's results unmodified.
What files/cases are causing issues? Without #75 they may not work or behave as expected at present. Feel free to fling a .zip over or link to an image and I'll take a look.
What files/cases are causing issues?
I was focused on h
, not file
so the cases are in the parameterization of the test_what_from_string_todo()
function.
("jpeg", b"______JFIF"),
("jpeg", b"______Exif"),
("rgb", b"\001\332"),
("tiff", b"II"),
("tiff", b"MM"),
("xbm", b"#define "),
It would be wonderful to add more image files in test/resources/images/
as discussed on test/test_main.py
line 17.
exr, pbm, pgm, ppm, ras, rgb, xbm
Quick breakdown based on #75, I'm assuming you are converting to hex:
JFIF
/EXIF
is present. This would be among the first improvements the 2.0 goals should address. Existing JPEG matching works well enough at presentb'II*\x00'
for Intel / b'MM\x00*'
for Motorola, more detailed notes in my PR about this.The JFIF/EXIF thing is a would be nice to have rather than critical to matching, a JPG will always have the base marker of 0xffd8ff
/ b'\xff\xd8\xff'
so any real file thrown at PureMagic will match. Adding these would boost confidence if they are present in the file.
Closes #72
72
This might be more complete after #75 lands.
75
Current pytest results:
53 passed in 0.05s
test_what_from_file
is 6 teststest_what_from_file_none
is 1 testtest_what_from_string
is 38 teststest_what_from_string_py311
is 2 testsimghdr
only on Python < 3.11test_what_from_string_todo
is 6 testspuremagic
@NebularNerd Your reviews, please.