Open drewnoakes opened 9 years ago
Hi, I have just uploaded the JX3fExtract library on my website, which might be of interest, see bernd-michaely.de/jx3fextract/javadoc/
@berndmichaely, thanks for the reference.
The issue you're having is identifying them? Basically need a new FileType
, a way to recognize it, and a reader to pull the exif directories?
It sounds simple enough to pull the last four bytes for the directory location. Any lessons learned here you wanna share before I dive in?
Offset 0 Length 4 "FOVb".
This was sufficient to identify my example image:
_root.addPath(FileType.X3f, "FOVb".getBytes());
I'll look at reading a little more tonight. Their spec makes it sound fairly simple.
I have implemented the x3f-raw-format specification in my JX3fExtract library available at bernd-michaely.de/jx3fextract/javadoc, and thought about integrating the functionality in metadata-extractor, however there are several issues and open questions:
Thanks for the very detailed reply.
RandomAccessReader
for at least TIFF, so I thought I'd leverage that logic, we know that works well. I'll take a look at that adapter.Once again, thanks a lot for this info. It confirms my conclusions from last night, which is very welcome. No longer feel like I'm diving into this with a flashlight.
Ok, I've pulled all the maker notes, I just need to create a directory and int mapping for the string tags. The exif I assume is in the ima2 section, Bernd?
@drewnoakes x3f store their properties as UTF16... :/. I had to add a two byte string parser so I made some quick name changes. Let me know what your preferred naming scheme would be:
Hi,
(sorry i had no time to answer yesterday...)
as already noted, the property strings are encoded in UTF-16 character encoding, which is a 16Bit encoding, so 1 character corresponds to 2 bytes.
The Exif data is not stored separately in a data section, but contained in a processed for preview jpeg file. That is you have to search for a IMAG
or IMA2
data section which is of a "processed for preview" type, and this would contain a jpeg which contains Exif information in turn, which you can extract in the usual way.
Note that in most files there are two jpegs, one in thumbnail size and one in full size the latter containing the Exif info, so you would search for the biggest image size.
And if you are already doing this, here are a few additional thoughts:
@ spec 2.1 Version Numbers:
The spec uses a major.minor
version numbering scheme, which is used separately for each file section, that is, different file sections may have different version numbers. E.g. the Quattro cameras have a 4.x header revision, but still 2.0 directory and processed for preview image data sections.
The spec denotes header bytes 4-7 as "File format version", but it is indeed only a header format version which can (and should) be ignored for file type detection (here only the FOVb
is important).
@ spec 2.5 Directory Pointer Section
Note that for corrupted files (e.g. cancelled download...) the end of the file, and with it the directory pointer may be cut off, the last four bytes of the file pointing to nirvana ...
(this leads to a recovery option, where the data sections would be searched for by their SEC*
section identifiers directly, but since those bytes may appear randomly inside the data, this method might not work even for intact files...)
@ spec 3.1 "PROP" property list
A property list subsection may appear 0..n times, where for key-value pairs with identical keys the newer (later) value replaces the earlier (Quattro cameras e.g. have none at all).
@ JUnit testing
The X3fParserTest.java
contains byte[] constants testDataX3f20Empty
and testDataX3f21Empty
containing full minimal valid FOVb files.
Don't apologize, we're all dong this in our spare moments ;-).
are encoded in UTF-16
I'm embarrassed to admit I lost an hour using the existing string reader and staring at your code going "Why oh why is he multiplying the offset be twoooooo!!!", lol
That is you have to search for a IMAG or IMA2 data section
I figured it was just piggybacking the thumbs. The info on exactly what (IMAG, IMA2) represent was sorely lacking, especially taking the larger, so thanks for that!
different file sections may have different version numbers
Wow, thanks. I would easily have overlooked that, basing most of this on a single test image and some Phil Harvey notes.
but it is indeed only a header format version which can (and should) be ignored
Well, 2.1 and 2.2 mean that the extended header section might exist right? Did they ever use that (my test image is 3.1)?
this leads to a recovery option...
I saw the recovery parser, I'll leave that for a day when I'm feeling particularly grumpy ;-), or never. For now I'll be satisfied to only handle intact images.
A property list subsection may appear 0..n times
I saw that in the spec. As it stands I was planning on overwriting prior values.
(Quattro cameras e.g. have none at all).
No properties?
JUnit testing
Thanks, I'll probably swipe those. I have a workable test setup in my android app, but testing in Android is a nightmare.
I'm embarrassed to admit I lost an hour using the existing string reader ...
Don't worry, the UTF-16 coding was also unusual to me, i was just glad it is supported as a standard charset by JRE ...
... basing most of this on a single test image and some Phil Harvey notes
Thanks for pointing to this. I just saw there are even a few hints to "SigmaRaw Header4 Tags" unknown to me, i will have a look at this later :-)
Well, 2.1 and 2.2 mean that the extended header section might exist right?
Right. (It should even always exist in this case.)
Did they ever use that (my test image is 3.1)?
I think i have never seen an older header version than 2.2 (Polaroid X530, Sigma SD9, SD10 used these already), but 2.2 and 2.3 is widely-used (SD14, SD15, first DP generation and Sigma Merrill generation with earlier firmware). Major version switched to 3 with newer Merrill firmware and to 4 with Quattro generation. (So i guess that 2.1 and older was used by Foveon only in times before they built consumer cameras based on that sensor.)
No properties?
No proprietary properties anymore, only Jpeg+Exif anymore. (SD9, SD10 used Properties only, no Jpeg, instead the propriertary Data Formats 3 and 11, later cameras produce both Properties and Jpeg with Exif, and Quattro dropped Properties (and i think also the processed for preview thumnail size image))
So quattro only requires an IMAG parse? The image sections have never changed, right?
So quattro only requires an IMAG parse? The image sections have never changed, right?
I haven't seen any (version 2.x) Properties in any Quattro files yet, but that is just a guess (could change with new firmware at any time ...)
Concerning the image data sections, one has to search for IMAG
as well as IMA2
sections (of type 2 = "processed for preview" and Data format 18 = JPEG), for current Quattros, they are still version 2.0
See #258
Actually ignore that comment. I removed those changes to from the reader and opted instead to process a byte block. It's more efficient, though we could always consider moving that code in if it's repeated elsewhere. For now there's no need to modify the reader.
Sigma's X3F files appear to contain Exif data, yet do not start with a regular TIFF preamble.
Also known as the Foveon X3 format or "FOVb".
References: