Roughsketch / imagesize

Quickly probe the size of various image formats without reading the entire file.
MIT License
57 stars 12 forks source link

Make format checking more accurate. #1

Closed Roughsketch closed 6 years ago

Roughsketch commented 7 years ago

I've yet to run into images that fail completely, but I would not be surprised if some exist. An easy solution would be to be more rigorous when checking for format instead of matching off the first byte. However, for that to matter I need some example images that do not work with the current implementation.

If someone finds such an image, let me know so I can work with it.

Roughsketch commented 7 years ago

As of version 3, there are safer versions of the get_dimensions* functions.

For more information: The original get_dimensions will check only the first byte of the data. This is unsafe since some headers are simply ASCII values. For instance, the magic number for a GIF image is GIF87 or GIF89. This means that get_dimensions would only see a G and assume it's a gif, even if it was actually a text document.

The newer safe variants (get_dimensions_safe and get_dimensions_from_blob_safe) try to avoid this problem somewhat by checking more than just a single byte. They will check for 2-8 bytes of identifying information depending on the file type. This means instead of checking just G for GIF images, it will check for GIF8.

Roughsketch commented 6 years ago

Pretty much fixed in version 0.5.0. The unchecked versions are now gone in favor of ones that check the header. The overhead is minimal, and in my benchmarks the checked versions added on average less than 100ns of time.