Roughsketch / imagesize

Quickly probe the size of various image formats without reading the entire file.
MIT License
57 stars 12 forks source link

Potential new formats and problems #8

Closed Roughsketch closed 4 years ago

Roughsketch commented 5 years ago

Edit:

As of now:


This crate currently handles 5 major image file formats, but I would like for it to handle anything common. However, many other formats may have issues or might not fit well in a specialized use case. It could also be said that keeping this crate small and specific might be beneficial, so I'm opening this up for discussion.

I will add a list of potential additions I was thinking of and their problems in no particular order. If there is any format you'd like to add, just mention it and I'll take a look at it.

PSD, PSB, PDN

These formats are used for containing images and are made from Photoshop and Paint.NET.

Problems

TGA

This is a relatively rarely used (in my experience) format. The only time I've personally used them are for Team Fortress 2 sprays.

Problems

ICO / CUR

This is used for icons, like application images and favicons for websites.

Problems

TIFF

A file format that is used by a lot of publishing related things.

Problems

Roughsketch commented 5 years ago

As of https://github.com/Roughsketch/imagesize/commit/41d2b6a99c11b376fb140245e263d5660deaaa54 PSD and PSB files were added. This was partly due to the simplicity of the format, so adding it as a possibility doesn't impact performance heavily.

jhspetersson commented 4 years ago

What about HEIF format? Would be great to support it with libheif-rs.

Roughsketch commented 4 years ago

I've never heard of that one, but I'll look into it at some point. I will say that I do plan to not use any dependencies for this crate, so if it does get added I'll just do the parsing myself.

jhspetersson commented 4 years ago

@Roughsketch I've never heard of that too before the recent day. Seems like this format is used on latest iPhones and such. Native parsing would be the best as I suppose libheif to be Mac-centric and not easily portable for now.

jhspetersson commented 4 years ago

BTW, targeting https://github.com/jhspetersson/fselect/issues/65

Roughsketch commented 4 years ago

So I have looked at HEIC files and tried to find information on the formats, but I can't find any concrete documentation for the structure. I do know of an easy fast way to get dimensions, but it will most likely not be 100% accurate for all HEIC files and I'm not sure if I want to add that in.

The easy way is to just check the header to make sure it's a HEIC file, and then stream bytes until you find the sequence 00 00 00 14 69 73 70 65, then skip 4 bytes, read a 4 byte width, and then a 4 byte height. I would think that this would run into issues similar to JPEG where that sequence could potentially be in a place where it doesn't denote the dimensions (e.g., a metadata section). It also doesn't handle multiple images being in one file, or cropping, or anything else HEIC files can do.

If you can find documentation on the structure so I can parse out data instead of looking for a sequence of bytes that'd be helpful. No guarantees on when, or if, it would be added though.

jhspetersson commented 4 years ago

Well, I found just the ISO standard which is freely available at https://standards.iso.org/ittf/PubliclyAvailableStandards/c066067_ISO_IEC_23008-12_2017.zip libheif project could be a source of inspiration as well, I suppose.

Anyway, thanks for the hint! I'll probably use that hack as a temporary solution.

Roughsketch commented 4 years ago

I didn't notice that when I was searching before, so that's a big help. I'll take a look at it when I have some free time.

Roughsketch commented 4 years ago

@jhspetersson I've pushed HEIF handling to master. I've tried it with various conformance files I've found online and they seem to work fine. If you have files you can test I'd appreciate it, and if it all looks fine I can publish 0.8.1.

jhspetersson commented 4 years ago

Sorry for being late with that. I tested 0.8.1 against few samples from fellow iPad owners. All these files are being reported as 512x512, seems like the dimensions are taken from some kind of preview image from the container, while main fragment is 3264x2448 or 2448x3264.

https://drive.google.com/open?id=1lLF9Tf0sHrunbJmH01VpxhOq-Qx9H_t6

Roughsketch commented 4 years ago

Okay, I'll have another pass at it. I think I know how to fix that easily.

jhspetersson commented 4 years ago

Sorry for returning to this, but is there any chance something could be done with the mentioned issue? Don't want to sound pushy, just asking

Roughsketch commented 4 years ago

@jhspetersson Could you pull master and see if it fixes your issue?

Roughsketch commented 4 years ago

I made a test for this and it comes out passing, but I do notice that the actual dimensions that both this crate and google docs get is actually wrong due to orientation. For instance in your google doc link the image I added to the repo to test (IMG_0007.heic) is labeled as 3264 x 2448, but when I preview it with google docs it looks like it should be 2448 x 3264. Windows correctly says it is 2448 x 3264 when I mouse over it in explorer. My guess is there's an orientation flag that google and this crate doesn't obey. I can try to look into this later.

Edit: I made another push to handle rotation. Handling is getting messy, but works for both test cases I have in the repo. Again, let me know if this works for you and if it does I'll release 0.8.4. I also might try fuzzing this a bit more as HEIC handling is a place where I imagine it could act poorly.

jhspetersson commented 4 years ago

It works for my cases perfectly! Thank you very much for the update.

Roughsketch commented 4 years ago

Great to hear. I'll push 0.8.4 now.

Roughsketch commented 4 years ago

Closing this issue since I'd rather new image format requests go into their own separate issues, and most of the formats mentioned in the OP are either implemented or cannot be implemented.