Exiv2 / exiv2

Image metadata library and tools
http://www.exiv2.org/
Other
936 stars 282 forks source link

Support for files from OM System cameras #2126

Closed sarunasb closed 2 years ago

sarunasb commented 2 years ago
Is your feature request related to a problem?

Currently exiv2 reports errors Error: XMP Toolkit error 201: Error in XMLValidator Warning: Failed to decode XMP metadata. for files from the newly release OM System OM-1 camera.

Describe the solution you would like

Would like exiv2 to fully decode metadata in files from OM System cameras.

Please find a sample JPG file from OM-1 camera attached.

OM100001

kmilos commented 2 years ago

There are potentially two separate issues here: the XMP payload strangeness, and then the MakerNotes (OLYMPUS -> "OM SYSTEM"). Might take a look when I get a chance...

clanmills commented 2 years ago

There's something wrong with the XMP metadata in that file. The data is:

547 rmills@rmillsm1:/etc/apache2 $ exiv2 -pX ~/Desktop/OM.JPG  | xmllint --pretty 1 -
<?xml version="1.0"?>
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:Rating="0"/>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
548 rmills@rmillsm1:/etc/apache2 $ 

It looks very innocent, however the XMPsdk built into Exiv2 dislikes if for reasons I don't know. I don't think there's anything interesting in that XMP metadata. You could easily write a script to remove the XMP metadata and your file will be good to go. If you subsequently write XMP metadata into the file, Exiv2 will use its built-in XMPsdk to write good metadata.

Here's the XMP in one of my files (which I edited with Picasa). Looks almost as boring. It does have the additional magic x:xmptk="XMP Core 5.1.2". Maybe that's significant. I'll need to step your file in the debugger in XMPsdk to discover why he dislikes the code in OM.JPG.

593 rmills@rmillsm1:~/gnu/github/exiv2/main/build $ exiv2 -pX ~/Stonehenge.jpg | xmllint --pretty 1 -
<?xml version="1.0"?>
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" xmp:Rating="0" xmp:ModifyDate="2015-07-16T20:25:28+01:00">
      <dc:description>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">Classic View</rdf:li>
        </rdf:Alt>
      </dc:description>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
594 rmills@rmillsm1:~/gnu/github/exiv2/main/build $ 
clanmills commented 2 years ago

@kmilos is 100% correct in his diagnosis. There are two issues with your file:

  1. XMP metadata strangeness
  2. MakerNote support for the OM.

The good news is that I know what's wrong with the XMP metadata in your file. I also have positive news about the Olympus MakerNote decoder.

Good News and your XMP metadata

I don't understand the XMPsdk code. It's abstract/template magic which I'm not clever enough to understand. However, my guess about x:xmptk="XMP Core 5.1.2" is correct. Here's what I've done.

  1. The JPEG is a collection of "segments" and you can see the structure with the command:

    $ exiv2 -pS OM.JPEG
  2. Chopped your file into two parts start and end using utility dd

  3. Extracted the XMP into file xmp and edited it to add x:xmptk="XMP Core 5.1.2"

  4. Spliced start xmp end > OM2.jpg

    541 rmills@rmillsmm-local:~/temp $ exiv2 -pS OM.JPEG 
    STRUCTURE OF JPEG FILE: OM.JPEG
    address | marker       |  length | data
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   39648 | Exif..II*........... ..........
    39652 | 0xffe1 APP1  |    2332 | http://ns.adobe.com/xap/1.0/.<?x
    41986 | 0xffdb DQT   |     132 
    42120 | 0xffc0 SOF0  |      17 
    42139 | 0xffc4 DHT   |     418 
    42559 | 0xffdd DRI   |       4 
    42565 | 0xffda SOS  
    542 rmills@rmillsmm-local:~/temp $ dd bs=1 count=39652 if=OM.JPEG of=start
    39652+0 records in
    39652+0 records out
    39652 bytes transferred in 0.213272 secs (185922 bytes/sec)
    543 rmills@rmillsmm-local:~/temp $ dd bs=1 skip=$((41986-1)) count=11111111111111 if=OM.JPEG of=end
    642636+0 records in
    642636+0 records out
    642636 bytes transferred in 3.736256 secs (172000 bytes/sec)
    544 rmills@rmillsmm-local:~/temp $ exiv2 -pX OM.JPEG | xmllint --pretty 1 - > xmp
    545 rmills@rmillsmm-local:~/temp $ bbedit xmp
    546 rmills@rmillsmm-local:~/temp $ cat xmp
    <?xml version="1.0"?>
    <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
    <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:Rating="0"/>
    </rdf:RDF>
    </x:xmpmeta>
    <?xpacket end="w"?>
    547 rmills@rmillsmm-local:~/temp $ cat start xmp end > OM2.JPEG

Olympus MakerNote Decoder

The makernote decoder isn't working on your file. Exiv2 supports Olympus cameras, and the MakerNote is an embedded Tiff in your file. It's likely that the OM is using a similar yet different header from other Olympus cameras. It needs to be investigated. It might be an easy change to support this.

548 rmills@rmillsmm-local:~/temp $ exiv2 -pa OM2.JPEG 
Exif.Image.ImageDescription                  Ascii      32  
Exif.Image.Make                              Ascii      24  OM Digital Solutions   
Exif.Image.Model                             Ascii      17  OM-1            
...
Exif.Photo.MakerNote                         Undefined 20156  79 77 32 83 89 83 84 69 ...  0 0 0 0
Exif.Photo.UserComment                       Undefined 125                                                                                                                       
...
549 rmills@rmillsmm-local:~/temp $ 

What to do about this?

it comes down to the availability of an engineer to work on this.

We should read the XMP specification. If a valid version of x:xmptk="XMP Core 5.1.2" is required by the Specification, then this is a bug in the OM firmware.

It may be possible to "tweak" the Adobe XMPsdk to be less nervous about the XMP strangeness.

It's easy to write a script to fix the XMP strangeness. My feeling is that the simplest way to do that is to totally remove the XMP metadata from your files. A script using exiv2 -pS, dd and cp is only a few lines of bash (or python or almost anything).

We'll need a volunteer to investigate the Olympus MakerNote. I'm not volunteering as I have retired. I've looked at this to "scope" the issue, however I no longer modify Exiv2 code.

postscript-dev commented 2 years ago

@sarunasb : Before uploading your test image, did you copy it directly from your camera or process it with software first? If software edited the metadata, the XMP problem may be caused by that.

As I already have other projects to work on, I can't commit to fixing this.

kmilos commented 2 years ago

@clanmills Thanks for the initial snoop and pointers!

The XMP strangeness is present also w/ ORF raw files (presumably directly from camera, samples should be appearing on raw.pixls.us sometime soon I hope), here's a similar dump missing the said field:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmp:Rating="0"/>
 </rdf:RDF>
</x:xmpmeta>

                                                                                        <?xpacket end="w"?>

I might play w/ subclassing the MakerNotes at some point, but XMP parsing is beyond me for now...

kmilos commented 2 years ago

From a quick look at https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf (2012 version though, there is a 2019 revision behind the ISO paywall), there is no mention of mandatory x:xmptk attribute:

image

clanmills commented 2 years ago

Always good to hear from you, @kmilos. I can decode MakerNotes in tvisitor. Straight forward. Later today, I'll update issue report concerning the MakerNote.

There is an exiv2 command-line option -dx to say "remove the XMP". It's working, although it's pumping out the SDK warning.

691 rmills@rmillsmm-local:~/gnu/github/exiv2/team.exiv2.org/book/build $ cp ~/temp/OM.JPEG .
692 rmills@rmillsmm-local:~/gnu/github/exiv2/team.exiv2.org/book/build $ exiv2 -pS OM.JPEG 
STRUCTURE OF JPEG FILE: OM.JPEG
 address | marker       |  length | data
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   39648 | Exif..II*........... ..........
   39652 | 0xffe1 APP1  |    2332 | http://ns.adobe.com/xap/1.0/.<?x
   41986 | 0xffdb DQT   |     132 
   42120 | 0xffc0 SOF0  |      17 
   42139 | 0xffc4 DHT   |     418 
   42559 | 0xffdd DRI   |       4 
   42565 | 0xffda SOS  
693 rmills@rmillsmm-local:~/gnu/github/exiv2/team.exiv2.org/book/build $ exiv2 -dx OM.JPEG 
Error: XMP Toolkit error 201: Error in XMLValidator
Warning: Failed to decode XMP metadata.
694 rmills@rmillsmm-local:~/gnu/github/exiv2/team.exiv2.org/book/build $ exiv2 -pS OM.JPEG 
STRUCTURE OF JPEG FILE: OM.JPEG
 address | marker       |  length | data
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   39648 | Exif..II*........... ..........
   39652 | 0xffdb DQT   |     132 
   39786 | 0xffc0 SOF0  |      17 
   39805 | 0xffc4 DHT   |     418 
   40225 | 0xffdd DRI   |       4 
   40231 | 0xffda SOS  
695 rmills@rmillsmm-local:~/gnu/github/exiv2/team.exiv2.org/book/build $ 

Thanks for reading the XMP spec. It's written in the terse language of a spec lawyer! It doesn't dictate anything about the attribute x:xmptk (="XMP Core 5.1.2").

I looked at the XMPsdk code earlier, it's another tough-to-understand son-of-a-bitch. We might be able to change the options to ParseFromBuffer() to ignore xmptk and save the day.

More later.

kmilos commented 2 years ago

The failure seems to happen very early, raised by libexpat parsing, way before we look into the XMP itself...

clanmills commented 2 years ago

Really? It's valid XML.

Thanks for the pointer to check_internal(). I'll sniff around in there.

sarunasb commented 2 years ago

@sarunasb : Before uploading your test image, did you copy it directly from your camera or process it with software first? If software edited the metadata, the XMP problem may be caused by that.

@postscript-dev The uploaded JPEG file was copied directly from SD card. No processing outside of camera firmware. The "picture" was taken with the lens cap on.

sarunasb commented 2 years ago

The XMP strangeness is present also w/ ORF raw files (presumably directly from camera, samples should be appearing on raw.pixls.us sometime soon I hope)

@clanmills Thank you for analysing the file! @kmilos I have already uploaded RAW ORF samples to raw.pixls.us several days ago.

kmilos commented 2 years ago

I have already uploaded RAW ORF samples to raw.pixls.us several days ago.

I know, that's why I said it should appear there soon (fingers crossed; it undergoes a manual review by the RPU maintainer...)

clanmills commented 2 years ago

There may be something wrong with the XML packet. buflen == 2301 and that seems correct to me. The XML finishes with:

</x:xmpmeta>. == 
44 46 3e 0a 3c 2f 78 3a 78 6d 70 6d 65 74 61 3e 0a 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
 D  F  > nl  <  /  x  :  x  m  p  m  e  t  a  > nl sp sp ......> to the endi                

I think the nl after the closing > may be upsetting XML_Parse(). Hard to believe. For sure something is causing this.

hassec commented 2 years ago

I'm guessing/hoping that the firmware will pretty much be that of an Olympus camera.

In that case we might be able to just "massage" the makernote creation to also create Olympus makernotes for OM cameras.
I've not yet had the time to try but figured I'd dump what I know in case someone wants to give it a shot.

The creation of a makernote goes through TiffMnCreator::create()

which uses the registry_ that doesn't know about cameras with the make OM Digital Solutions: https://github.com/Exiv2/exiv2/blob/02b0541bbf65568cdca6c1d4faedf68b15a3ea15/src/makernote_int.cpp#L106-L115

so that registry_ as well as the family of newOlympusMn() functions would need to be adapted to understand the new OM Digital Solutions string inside the makernote.

kmilos commented 2 years ago

@hassec Exactly, it should hopefully just be a matter of supporting the new "OM SYSTEM" MakerNote signature/header through new subclasses...

kmilos commented 2 years ago

I think the nl after the closing > may be upsetting XML_Parse(). Hard to believe.

It is hard to believe. I tried replacing with \x00 in a hex editor, and the XML validation error persists

clanmills commented 2 years ago

@hassec @kmilos The changes to the Olympus MakerNote are easy. I have them working in tvisitor.cpp. I'm focused on understanding what's upsetting XMP_Parse().

I think my earlier fix involving x:xmptk="XMP Core 5.1.2" is totally wrong. That worked for two reasons:

  1. I wrongly removed the http://ns.adobe.com/xap/1.0/. signature in OM2.JPEG, so the XML wasn't read at all.
  2. I sanitized the XML in the editor.

Relax Everybody: This will be simple when we find it which we will.

clanmills commented 2 years ago

There's something wrong with the XMP buffer. I'm cross-eyed with this. It's got to be simple - a trailing null byte (or something like that). I'm 100% certain that it is not x:xmptk="XMP Core 5.1.2". I can prove that, if you ask.

The changes to tvisitor to support this are two fold:

  1. The maker note handler must deal with the new MakerNote header OM SYSTEM\0II. It's almost exactly the same as the existing OLYMPUS\0II header.
case kOlym:
    if ( buf.begins("OLYMPUS\0II")        // "OLYMPUS\0II\0x3\0x0"short      0x3\0x0   version?  short = E# (dictionary count)
    ||   buf.begins("OM SYSTEM\0\0II")    // "OM SYSTEM\0\0II\0x4\0x7"short  0x04 0x07 version?  short = E#
    ) { 
        Io     io(io_,offset,count);
        TiffImage makerNote(io,image_.maker_);
        makerNote.start_ =  buf.begins("OLYMPUS\0II") ? 12 : 16 ;
        makerNote.valid_ = true; // Valid without magic=42
        makerNote.accept(visitor,makerDict());
    } else {                              // "OLYMP\0bytebyteshort"          bytebyte  version?  short = E#
        size_t punt = 8 ;                 // "OLYMPUS\0short"                                    short = E#
        IFD makerNote(image_,offset+punt,false);
        makerNote.accept(visitor,makerDict());
    } break;
  1. The makers dictionary has to know that Exif.Image.Make = "OM Digital Solutions" is an olymDict.
makers["Canon"      ]  = kCanon  ; makerDicts[kCanon ] = &canonDict;
...
makers["OLYMPUS"    ]  = kOlym   ; makerDicts[kOlym  ] = &olymDict;
makers["OM Digital Solutions"]
                       = kOlym   ; makerDicts[kOlym  ] = &olymDict;
makers["FUJIFILM"   ]  = kFuji   ; makerDicts[kFuji  ] = &fujiDict;
makers["PENTAX"     ]  = kPentax ; makerDicts[kPentax] = &pentaxDict;

The tvisitor makernote code cannot be cut'n'pasted into Exiv2, so somebody will have to figure what I've said and how to represent it in the exiv2 code base.

I don't want to get further sucked into this. I'll be curious to learn what I cannot spot about the XMP. It will be obvious.

kmilos commented 2 years ago

@clanmills Ah, this could be https://github.com/libexpat/libexpat/issues/572 that was fixed recently. 2.4.7 should land in MSYS2 in the next couple of days so I'll test it then.

piponazo commented 2 years ago

@clanmills Ah, this could be libexpat/libexpat#572 that was fixed recently. 2.4.7 should land in MSYS2 in the next couple of days so I'll test it then.

Ha, interesting, we should also update the conanfile so that we bring the newest version of expat (2.4.7) when it is available. Right now, in the conan center repository the latest available version is 2.4.6:

https://conan.io/center/expat

And currently we are requiring the usage of 2.4.1: https://github.com/Exiv2/exiv2/blob/02b0541bbf65568cdca6c1d4faedf68b15a3ea15/conanfile.py#L40

clanmills commented 2 years ago

Could be. However this happening on my old MacMini. CMake says its using Expat 2.2.8 from the MacOSX10.15.sdk. The command $ exiv2 -vVg library | grep expat is reporting that the run-time library is /usr/lib/libexpat.1.dylib which is installed (and protected) by Apple.

I'll probably look at this again tomorrow if nobody solves it.

kmilos commented 2 years ago

Actually I already have 2.4.7 on MSYS2, so not it 😖

clanmills commented 2 years ago

Solved. It's the trailing \0 and \n bytes which is what I thought this morning.

1012 rmills@rmillsmm-local:~/gnu/github/exiv2/main/build $ env DYLD_LIBRARY_PATH=$PWD/lib bin/exiv2 -vVg exiv2
exiv2 1.0.0.9
exiv2=1.0.0
processpath=/Users/rmills/gnu/github/exiv2/main/build/bin
package_name=exiv2
executable=/Users/rmills/gnu/github/exiv2/main/build/bin/exiv2
library=/Users/rmills/gnu/github/exiv2/main/build/lib/libexiv2.1.0.0.9.dylib
config_path=/Users/rmills/.exiv2
1013 rmills@rmillsmm-local:~/gnu/github/exiv2/main/build $ env DYLD_LIBRARY_PATH=$PWD/lib bin/exiv2 ~/temp/OM.JPEG 
File name       : /Users/rmills/temp/OM.JPEG
File size       : 684621 Bytes
MIME type       : image/jpeg
Image size      : 5184 x 3888
Thumbnail       : image/jpeg, 968 Bytes
Camera make     : OM Digital Solutions   
Camera model    : OM-1            
Image timestamp : 2022:03:05 14:05:59
File number     : 
Exposure time   : 1/250 s
Aperture        : F5.6
Exposure bias   : 0 EV
Flash           : Yes, did not fire
Flash bias      : 
Focal length    : 12.0 mm
Subject distance: 
ISO speed       : 200
Exposure mode   : Manual
Metering mode   : Multi-segment
Macro mode      : 
Image quality   : 
White balance   : Manual
Copyright       : Sarunas Burdulis                                               
Exif comment    :       
1014 rmills@rmillsmm-local:~/gnu/github/exiv2/main/build $ env DYLD_LIBRARY_PATH=$PWD/lib bin/exiv2 -px ~/temp/OM.JPEG 
Xmp.xmp.Rating                               XmpText     1  0
1015 rmills@rmillsmm-local:~/gnu/github/exiv2/main/build $ env DYLD_LIBRARY_PATH=$PWD/lib bin/exiv2 -K Exif.Image.Make  ~/temp/OM.JPEG 
Exif.Image.Make                              Ascii      24  OM Digital Solutions   
1016 rmills@rmillsmm-local:~/gnu/github/exiv2/main/build $                                                                                                                

Here's the one-line fix.

             while ( 0 < (int) buflen && (buf[buflen-1] == 0 || buf[buflen-1] == 10) ) buflen--;

The patch. (Please don't submit the three commented off lines you'll find them useful to debug/verify the fix.).

diff --git a/src/xmp.cpp b/src/xmp.cpp
index 133d38d6..1c78eb5c 100644
--- a/src/xmp.cpp
+++ b/src/xmp.cpp
@@ -101,6 +101,10 @@ namespace {
             if (buflen > static_cast<size_t>(std::numeric_limits<int>::max())) {
                 throw Error(kerXMPToolkitError, "Buffer length is greater than INT_MAX");
             }
+            //for ( int i = (int) buflen - 5 ; i < (int)buflen ; i++ ) {
+            //    std::cout <<  i << ':' <<  (int) buf[i] << std::endl;
+            //}
+            while ( 0 < (int) buflen && (buf[buflen-1] == 0 || buf[buflen-1] == 10) ) buflen--;

             XML_SetUserData(parser_, this);
             XML_SetElementHandler(parser_, startElement_cb, endElement_cb);

I don't know if I can hear you groaning or giving me a cheer.

kmilos commented 2 years ago

Both?

But it's incredulous libexpat doesn't tolerate LF and NUL surplus at the end and doesn't do this chomping internally... I feel like we should report it, unless I'm missing something about XML rules?

Edit: Changing the count to 2299 (from 2301) for the tag 0x2BC of the ORF file w/ the hex editor also works. An extra LF and count 2300 curiously also works, but an extra NUL and count 2300 doesn't. Go figure.

clanmills commented 2 years ago

Yup. Both. Shit, isn't it? Reporting a bug to expat? I don't think I'm brave enough.

I reviewed and approved Kev's check_internal() function and asked him to add it to 0.27-maintenance. I wonder if XMPsdk deals with this quietly. I'll run an old release (such as 0.27) on that file.

I'm running a test on all my 80,000 photos to see how many XML Validator errors have been fixed. These are quite common when files have been edited (usually by non-Adobe products). I'll run the same test with 0.27. This could be a wonderful and surprising find.

kmilos commented 2 years ago

FYI, my trials were all w/ 0.27.5

clanmills commented 2 years ago

I think check_internal() was added to 0.27.4 or 0.27.5 (Summer 2021, I think). That's why I'm thinking about 0.27 for an old test. check_internal() was added after 0.27 (December 2018).

clanmills commented 2 years ago

This issue is present on 0.27-maintenance and NOT on 0.27.0. So check_internal() has brought this to light.

I'll run the test on my 80,000 images. That takes about 30 minutes for each version of exiv2. For sure the fix will not increase the number of "Validator" warnings. I only have a few hundred 'Validator Warning' files from photos that were given to me by family and friends to display on my web-site.

clanmills commented 2 years ago
Exiv2 Build Validator Errors
(from 80,000 images)
'main' + fix 151
'0.27-maintenance' 300
'0.27.0' 152

When this is fixed, we should change the error being thrown by check_internal() which has hi-jacked an exception from the XMPsdk. The exception is coming from the Exiv2 code and should say "XMP metadata is not valid XML".

I purchased ACDSee Photo Studio for Mac to replace Picasa to manage my photos. I reported a bug to them because I believed that their metadata editor was causing XMPsdk errors. I offered to work with their engineers to identify and resolve the issue and they replied saying "We have notified our QE Group of your offer" and there the matter died. I now know that the ACDSee XMP metadata is OK and it is check_internal() which has been throwing an exception and blaming the XMPsdk.

kmilos commented 2 years ago

@hassec As for the MakerNote, I tried following the Sigma/Foveon example where we have two signatures for the same structure. Alas, it doesn't work out automagically here as the signatures are not of the same length so it is not that straightforward. I stopped going down the rabbit hole for the moment, so any ideas on how to handle this best are welcome.

clanmills commented 2 years ago

@kmilos and @hassec I'm happy to look at this, if you wish. Is the "Sigma/Foveon" example, the image attached above:

$ ls -l ~/Downloads/OM.JPEG 
-rw-r--r--@ 1 rmills  staff  684621 16 Mar 14:15 /Users/rmills/Downloads/OM.JPEG
$ 

I haven't understood your comment: we have two signatures for the same structure. The changes in tvisitor were straightforward https://github.com/Exiv2/exiv2/issues/2126#issuecomment-1060792127

Both tvisitor and exiv2 makernote decoder know the offset in the MakerNote tag to the IFD. tvisitor is simpler because it detects Exif.Image.Make while parsing IFD0. Exiv2 uses class TiffFinder to search from the MakerNote code.

The exiv2 code has a state machine which isn't needed by tvisitor. The state machine is used to determine which makernote decoder to execute (or something like that). I'll have to refresh my memory about this, however I'm optimistic that the change will be small (probably one line of code).

Would you like me to look at this?

kmilos commented 2 years ago

I'm optimistic that the change will be small (probably one line of code)

Unfortunately not the case w/ the current architecture I'm afraid.

clanmills commented 2 years ago

Would you like me to investigate?

kmilos commented 2 years ago

Feel free, by all means.

I didn't find a better way than to duplicate and create the entire new Olympus3MnHeader class and all the plumbing around it to connect it to the existing Olympus MakerNote structure.

clanmills commented 2 years ago

Oh, man, that doesn't sound good. The test file is the one attached?

$ ls -l ~/Downloads/OM.JPEG 
-rw-r--r--@ 1 rmills  staff  684621 16 Mar 14:15 /Users/rmills/Downloads/OM.JPEG

It's a really wet day in England. I'll look at this.

kmilos commented 2 years ago

Yep, the OP attached JPEG has the new "OM SYSTEM" MakerNote.

clanmills commented 2 years ago

Right. I don't think we need to duplicate the code. The code needs to be simplified.

It's similar to the changes in tvisitor.cpp. Two things need to be done.

  1. We have to recognise the Make
diff --git a/src/makernote_int.cpp b/src/makernote_int.cpp
index f97a576c..bb988005 100644
--- a/src/makernote_int.cpp
+++ b/src/makernote_int.cpp
@@ -100,6 +100,7 @@ namespace Exiv2::Internal {
         { "Minolta",        minoltaId,   newIfdMn,       newIfdMn2       },
         { "NIKON",          ifdIdNotSet, newNikonMn,     nullptr               }, // mnGroup_ is not used
         { "OLYMPUS",        ifdIdNotSet, newOlympusMn,   nullptr               }, // mnGroup_ is not used
+        { "OM Digital Solutions",  ifdIdNotSet, newOlympusMn,  nullptr   },
         { "Panasonic",      panasonicId, newPanasonicMn, newPanasonicMn2 },
         { "PENTAX",         ifdIdNotSet, newPentaxMn,    nullptr               }, // mnGroup_ is not used
         { "RICOH",          ifdIdNotSet, newPentaxMn,    nullptr               }, // mnGroup_ is not used
  1. We have to "sniff" the header and set the offset to the IFD. The tvisitor code is:
    if ( buf.begins("OLYMPUS\0II")        // "OLYMPUS\0II\0x3\0x0"short      0x3\0x0   version?  short = E# (dictionary count)
    ||   buf.begins("OM SYSTEM\0\0II")    // "OM SYSTEM\0\0II\0x4\0x7"short  0x04 0x07 version?  short = E#
    ) { 
        Io     io(io_,offset,count);
        TiffImage makerNote(io,image_.maker_);
        makerNote.start_ =  buf.begins("OLYMPUS\0II") ? 12 : 16 ;
....

The Exiv2 equivalent is newOlympusMn() which I believe we can simplify and rewrite. Here's my reasoning:

    TiffComponent* newOlympusMn(uint16_t    tag,
                                IfdId       group,
                                IfdId       /*mnGroup*/,
                                const byte* pData,
                                size_t size,
                                ByteOrder   /*byteOrder*/)
    {
        if (size < 10 ||   std::string(reinterpret_cast<const char*>(pData), 10)
                        != std::string("OLYMPUS\0II", 10)) {
            // Require at least the header and an IFD with 1 entry
            if (size < OlympusMnHeader::sizeOfSignature() + 18) return nullptr;
            return newOlympusMn2(tag, group, olympusId);
        }
        // Require at least the header and an IFD with 1 entry
        if (size < Olympus2MnHeader::sizeOfSignature() + 18) return nullptr;
        return newOlympus2Mn2(tag, group, olympus2Id);
    }

This code is very inflexible and believes the only valid signature is:

    const byte Olympus2MnHeader::signature_[] = {
        'O', 'L', 'Y', 'M', 'P', 'U', 'S', 0x00, 'I', 'I', 0x03, 0x00
    };

    size_t Olympus2MnHeader::sizeOfSignature()
    {
        return sizeof(signature_);
    }

    Olympus2MnHeader::Olympus2MnHeader()
    {
        read(signature_, sizeOfSignature(), invalidByteOrder);
    }

    size_t Olympus2MnHeader::size() const
    {
        return header_.size();
    }

    size_t Olympus2MnHeader::ifdOffset() const
    {
        return sizeOfSignature();
    }

    uint32_t Olympus2MnHeader::baseOffset(uint32_t mnOffset) const
    {
        return mnOffset;
    }

Conclusion: We don't need to duplicate code. We need to throw away some code and newOlympusMn2() will parse the makernote!

kmilos commented 2 years ago

Yes, there is more than one way to tackle this, but none so trivial/pretty...

clanmills commented 2 years ago

The pattern is:

header: [alpha]+\0IIversion__IFD__ title: [alpha]+
version: short eg 0x0300 or 0x0407 __IFD__ is E# ENTRY ENTRY ENTRY ... E# is a short (number of entries) ENTRY is 12 bytes (tag, type, count, offset)

The "title" can be OLYMPUS or "OM SYSTEM". We don't really care, it could be "DONALD DUCK" provided it is terminated by '\0' 'I' 'I'.

So the correct code will be something like this.

    TiffComponent* newOlympusMn(uint16_t    tag,
                                IfdId       group,
                                IfdId       mnGroup,
                                const byte* pData,
                                size_t size,
                                ByteOrder   /*byteOrder*/)
    {
        TiffComponent* result = nullptr;
        if ( enough(pData,size) ) {
            result = new TiffIfdMakernote(tag, group, mnGroup, new OlympusMn2Header);
            size_t offset = getOffset(pData) ; // usually 16 or 12;
            result.setStart(pData+offset);
        }
        return result;
    }

Classes such as OlympusMn2Header are very stiff and inflexible. The code in tvisitor is much simpler and parses every makernote I have found!

kmilos commented 2 years ago

Agreed that it's the header class creating the unnecessary friction here, but a) I don't particularly feel like taking on refactoring this now, and b) think we need a solution for possible 0.27.6 to enable various apps in the nearer term.

kmilos commented 2 years ago

And while that is indeed the prevalent pattern for MakerNote signatures/headers, it is by no means a prescribed standard we can count on vendors sticking to, so anything really goes:

https://exiftool.org/makernote_types.html

clanmills commented 2 years ago

I believe I can can get this to work without "refactoring". Let's see what I can figure out tomorrow or Friday.

I have confused you when I said The code in tvisitor is much simpler and parses every makernote I have found!. Only the OLYMPUS makernote uses the pattern I described.

tvisitor.cpp handles makernotes with ease. It locates the embedded Tiff or IFD, creates the appropriate class on the stack and visits. There is no state machine, no recursive TiffFind() and no supporting makernote classes. It's totally re-entrant without a limit of 10 "sub-images".

I have never encountered a MakerNote that was not an IFD or Tiff. I believe Peter encountered a Kodak file in which the MakerNote was not Tiff/IFD. To deal with that, I would create a Kodak class with an accept(visitor) method.

void IFD::visitMakerNote(Visitor& visitor,DataBuf& buf,uint64_t count,uint64_t offset)
{
    switch ( image_.maker_ ) {
    ...
        case kOlym:
            if ( buf.begins("OLYMPUS\0II")        // "OLYMPUS\0II\0x3\0x0"short      0x3\0x0   version?  short = E# (dictionary count)
            ||   buf.begins("OM SYSTEM\0\0II")    // "OM SYSTEM\0\0II\0x4\0x7"short  0x04 0x07 version?  short = E#
            ) { 
                Io     io(io_,offset,count);
                TiffImage makerNote(io,image_.maker_);
                makerNote.start_ =  buf.begins("OLYMPUS\0II") ? 12 : 16 ;
                makerNote.valid_ = true; // Valid without magic=42
                makerNote.accept(visitor,makerDict());
            } else {                              // "OLYMP\0bytebyteshort"          bytebyte  version?  short = E#
                size_t punt = 8 ;                 // "OLYMPUS\0short"                                    short = E#
                IFD makerNote(image_,offset+punt,false);
                makerNote.accept(visitor,makerDict());
            } break;
    ...
    }
} // visitMakerNote
clanmills commented 2 years ago

As the sun is shining today, I'm going to work in the garden. I don't intend to do any more work on this issue unless asked to do so.

sarunasb commented 2 years ago

@clanmills So this is asking you to help with Olympus/OM System MakerNote, whenever you feel like, on a rainy day... :) Thank you!

clanmills commented 2 years ago

275347210_1333647243800216_8489668274434268825_n (1)

kmilos commented 2 years ago

While Robin is trimming his perfect garden on a perfect day, feel free to test the "dumb" patch (0.27 branch) by duplicating all the necessary classes and functions.

sarunasb commented 2 years ago

Compiled 027_omsystem_mn branch. exiv2 -pv outputs roughly the same amount of bytes from ORF samples from Olympus Pen-F and OM System OM-1.

Compiled darktable:master with rawspeed from kmilos:om-1, linked with Exiv2:027_omsystem_mn. darktable reports the same set of metadata for samples files from Pen-F and OM-1.

I'm glad to do any more sensible tests, if there is some established routine.

kmilos commented 2 years ago

Somebody should let OMDS know they're not following spec - the same nul terminated XML problem is present in the new OM-5...