Support Sigma X3F camera RAW format

drewnoakes commented 9 years ago

Sigma's X3F files appear to contain Exif data, yet do not start with a regular TIFF preamble.

Also known as the Foveon X3 format or "FOVb".

References:

berndmichaely commented 8 years ago

Hi, I have just uploaded the JX3fExtract library on my website, which might be of interest, see bernd-michaely.de/jx3fextract/javadoc/

drewnoakes commented 8 years ago

@berndmichaely, thanks for the reference.

rcketscientist commented 7 years ago

The issue you're having is identifying them? Basically need a new FileType, a way to recognize it, and a reader to pull the exif directories?

It sounds simple enough to pull the last four bytes for the directory location. Any lessons learned here you wanna share before I dive in?

rcketscientist commented 7 years ago

Offset 0 Length 4 "FOVb".

This was sufficient to identify my example image:

_root.addPath(FileType.X3f, "FOVb".getBytes());

I'll look at reading a little more tonight. Their spec makes it sound fairly simple.

berndmichaely commented 7 years ago

I have implemented the x3f-raw-format specification in my JX3fExtract library available at bernd-michaely.de/jx3fextract/javadoc, and thought about integrating the functionality in metadata-extractor, however there are several issues and open questions:

X3F files (except some early ones) contain embedded jpeg images which are "processed for preview images", and if they are used for that preview purpose, then access to that embedded files, which are relatively small compared to the full raw file, should be efficient, especially when read e.g. from a slow memory card reader. Now access to the embedded jpeg requires some jumping around in the raw file: after identifying the file format by the four "FOVb" magic bytes, one has to jump to the last four bytes of the file (!) pointing to a directory structure, then jumping to this directory, finding the processed for preview image data section entry, and finally jumping there. Now, metadata-extractor access works with buffered input streams, which might, at least in theory, be inefficient for this situation (e.g. it might require to effectively transfer the full raw file). In JX3fExtract i have instead used a java.io.RandomAccessFile based approach (see io subpackage; there is also a RadInputStream adapter, which you can use in two stages: first use a RandomAccessData to efficiently locate the embedded file and then use a RadInputStream to stream adapter to use metadata-extractor to extract the Exif information).
X3F files contain (in addition to Jpeg/Exif, at least before the current Quattro camera generation) proprietary metadata ("Properties") as a simple String to String mapping for metatag keys and values, which is different from Exif style using integer keys and typed values. How would this be integrated in metadata-extractor (if at all)? (The documented list is also incomplete - on the other hand, the raw Strings, even if unknown/undocumented, are self-descriptive to some degree.)

rcketscientist commented 7 years ago

Thanks for the very detailed reply.

That's right in line with my research last night. I didn't quite get to the directory hunt, so I appreciate the confirmation. As for the thumb, I don't think that's a goal of this project, Drew can correct me if I'm wrong on that. There will always be better tools for that IMO(dcraw, libraw). For now I'll skip it either way. The efficient question was the one I was working on last night. This project actually uses a RandomAccessReader for at least TIFF, so I thought I'd leverage that logic, we know that works well. I'll take a look at that adapter.
I was looking at the properties and indeed I thought that list seemed incomplete. It's good to know it's not a matter of deciphering byte tags, though. How is the EXIF section stored?

Once again, thanks a lot for this info. It confirms my conclusions from last night, which is very welcome. No longer feel like I'm diving into this with a flashlight.

rcketscientist commented 7 years ago

Ok, I've pulled all the maker notes, I just need to create a directory and int mapping for the string tags. The exif I assume is in the ima2 section, Bernd?

rcketscientist commented 7 years ago

@drewnoakes x3f store their properties as UTF16... :/. I had to add a two byte string parser so I made some quick name changes. Let me know what your preferred naming scheme would be:

https://github.com/rcketscientist/metadata-extractor/commit/d0eace4e98a7d75993ea698b476c022581f3c2f7#diff-c5f93be4e042a2a86706739e098e766a

berndmichaely commented 7 years ago

Hi, (sorry i had no time to answer yesterday...) as already noted, the property strings are encoded in UTF-16 character encoding, which is a 16Bit encoding, so 1 character corresponds to 2 bytes. The Exif data is not stored separately in a data section, but contained in a processed for preview jpeg file. That is you have to search for a IMAG or IMA2 data section which is of a "processed for preview" type, and this would contain a jpeg which contains Exif information in turn, which you can extract in the usual way. Note that in most files there are two jpegs, one in thumbnail size and one in full size the latter containing the Exif info, so you would search for the biggest image size. And if you are already doing this, here are a few additional thoughts: @ spec 2.1 Version Numbers: The spec uses a major.minor version numbering scheme, which is used separately for each file section, that is, different file sections may have different version numbers. E.g. the Quattro cameras have a 4.x header revision, but still 2.0 directory and processed for preview image data sections. The spec denotes header bytes 4-7 as "File format version", but it is indeed only a header format version which can (and should) be ignored for file type detection (here only the FOVb is important). @ spec 2.5 Directory Pointer Section Note that for corrupted files (e.g. cancelled download...) the end of the file, and with it the directory pointer may be cut off, the last four bytes of the file pointing to nirvana ... (this leads to a recovery option, where the data sections would be searched for by their SEC* section identifiers directly, but since those bytes may appear randomly inside the data, this method might not work even for intact files...) @ spec 3.1 "PROP" property list A property list subsection may appear 0..n times, where for key-value pairs with identical keys the newer (later) value replaces the earlier (Quattro cameras e.g. have none at all). @ JUnit testing The X3fParserTest.java contains byte[] constants testDataX3f20Empty and testDataX3f21Empty containing full minimal valid FOVb files.

rcketscientist commented 7 years ago

Don't apologize, we're all dong this in our spare moments ;-).

are encoded in UTF-16

I'm embarrassed to admit I lost an hour using the existing string reader and staring at your code going "Why oh why is he multiplying the offset be twoooooo!!!", lol

That is you have to search for a IMAG or IMA2 data section

I figured it was just piggybacking the thumbs. The info on exactly what (IMAG, IMA2) represent was sorely lacking, especially taking the larger, so thanks for that!

different file sections may have different version numbers

Wow, thanks. I would easily have overlooked that, basing most of this on a single test image and some Phil Harvey notes.

but it is indeed only a header format version which can (and should) be ignored

Well, 2.1 and 2.2 mean that the extended header section might exist right? Did they ever use that (my test image is 3.1)?

this leads to a recovery option...

I saw the recovery parser, I'll leave that for a day when I'm feeling particularly grumpy ;-), or never. For now I'll be satisfied to only handle intact images.

A property list subsection may appear 0..n times

I saw that in the spec. As it stands I was planning on overwriting prior values.

(Quattro cameras e.g. have none at all).

No properties?

JUnit testing

Thanks, I'll probably swipe those. I have a workable test setup in my android app, but testing in Android is a nightmare.

berndmichaely commented 7 years ago

I'm embarrassed to admit I lost an hour using the existing string reader ...

Don't worry, the UTF-16 coding was also unusual to me, i was just glad it is supported as a standard charset by JRE ...

... basing most of this on a single test image and some Phil Harvey notes

Thanks for pointing to this. I just saw there are even a few hints to "SigmaRaw Header4 Tags" unknown to me, i will have a look at this later :-)

Well, 2.1 and 2.2 mean that the extended header section might exist right?

Right. (It should even always exist in this case.)

Did they ever use that (my test image is 3.1)?

I think i have never seen an older header version than 2.2 (Polaroid X530, Sigma SD9, SD10 used these already), but 2.2 and 2.3 is widely-used (SD14, SD15, first DP generation and Sigma Merrill generation with earlier firmware). Major version switched to 3 with newer Merrill firmware and to 4 with Quattro generation. (So i guess that 2.1 and older was used by Foveon only in times before they built consumer cameras based on that sensor.)

No properties?

No proprietary properties anymore, only Jpeg+Exif anymore. (SD9, SD10 used Properties only, no Jpeg, instead the propriertary Data Formats 3 and 11, later cameras produce both Properties and Jpeg with Exif, and Quattro dropped Properties (and i think also the processed for preview thumnail size image))

rcketscientist commented 7 years ago

So quattro only requires an IMAG parse? The image sections have never changed, right?

berndmichaely commented 7 years ago

So quattro only requires an IMAG parse? The image sections have never changed, right?

I haven't seen any (version 2.x) Properties in any Quattro files yet, but that is just a guess (could change with new firmware at any time ...) Concerning the image data sections, one has to search for IMAG as well as IMA2 sections (of type 2 = "processed for preview" and Data format 18 = JPEG), for current Quattros, they are still version 2.0

drewnoakes commented 7 years ago

@rcketscientist did you end up getting some support for Sigma in your own code?

I saw a suggestion that you have in your recent PR here (which was the reason it failed to build).

(edit: re-reading the thread I see your comment here)

drewnoakes commented 7 years ago

See #258

rcketscientist commented 7 years ago

Actually ignore that comment. I removed those changes to from the reader and opted instead to process a byte block. It's more efficient, though we could always consider moving that code in if it's repeated elsewhere. For now there's no need to modify the reader.

drewnoakes / metadata-extractor

Support Sigma X3F camera RAW format #76