Exiv2 / exiv2

Image metadata library and tools
http://www.exiv2.org/
Other
936 stars 282 forks source link

Canon CR3 support. #236

Closed LebedevRI closed 3 years ago

LebedevRI commented 6 years ago

Just thought i file the meta-bug.

I'm investigating that new raw format from the point-of-view of the raw image decoding in RawSpeed library in https://github.com/darktable-org/rawspeed/issues/121. Right now exiv2 (0.25 001900) knows nothing about it. It should eventually have the same support as for the usual CR2:

clanmills commented 6 years ago

Oh! Something new to be supported. As a retired Adobe Engineer I think "Why don't they all use DNG?" One standard's enough! With my Exiv2/opensource hat, "Why doesn't Canon contribute to Exiv2 to support their new standard?".

If you're willing to get involved, Team Exiv2 will support and mentor. Our existing commitments make CR3 very unlikely to get attention in 2018.

LebedevRI commented 6 years ago

Why don't they all use DNG?

Tell me about it :)

"Why doesn't Canon contribute to Exiv2 to support their new standard?".

Should they do that, make sure they get me/rawspeed a spec of the actual image compression algo they used :)

If you're willing to get involved

We'll see. Certainly not before the rawspeed part is done and working. (Also, things like #214 in 2018 aren't too encouraging...)

clanmills commented 6 years ago

Why are you worried about #214? I want to continue to support C++98 and add C++11/14 on top. The team want to dump C++98 support. What is your opinion?

LebedevRI commented 6 years ago

I'm not worried, i'm not working on that code :) There are two different things here - usage of C++11/C++14 in the public API, and in the internals. You might be able to keep the API usable with older standards (though i can not comment whether anyone will ever need that, and it may negatively affect the users that are using newer standards). But sticking with old standards for the internal code is whole other matter...

clanmills commented 6 years ago

So we can use any technology in the library and offer an API that can be used by C++98 (or C++11/14/17) application code.

One of the goals in v0.27 is to try to establish a "v1.0 API". Something like: "We hope the API for v0.27 will be the API for v1.0 and we will try to avoid changes to the API for v0.28". If the API for v0.28 is identical to v0.27, we will call it v1.0.

This is a very useful conversation. We're having a Exiv2 Team Meeting at my home in England on the weekend of Saturday May 5 and we will discuss this topic (and many others). You (and any of your open-source friends) are welcome. #225. It won't be all work, it's a social/team-building weekend and partners will also attend.

clanmills commented 6 years ago

There are sample images available here: http://www.photographyblog.com/reviews/canon_eos_m50_review/preview_images

763 rmills@rmillsmbp:~/clanmills $ dmpf ~/Downloads/canon_eos_m50_01.cr3 | head -10
       0        0: ....ftypcrx ....  ->  00 00 00 18  f  t  y  p  c  r  x    00 00 00 01
    0x10       16: crx isom..o.moov  ->   c  r  x     i  s  o  m 00 00  o 18  m  o  o  v
    0x20       32: ..fhuuid........  ->  00 00  f  h  u  u  i  d 85 c0 b6 87 82 0f 11 e0
    0x30       48: ....F+jH...&CNCV  ->  81 11 f4 ce  F  +  j  H 00 00 00  &  C  N  C  V
    0x40       64: CanonCR3_001/00.  ->   C  a  n  o  n  C  R  3  _  0  0  1  /  0  0 2e
    0x50       80: 09.00/00.00.00..  ->   0  9 2e  0  0  /  0  0 2e  0  0 2e  0  0 00 00
    0x60       96: .\CCTP..........  ->  00  \  C  C  T  P 00 00 00 00 00 00 00 01 00 00
    0x70      112: ......CCDT......  ->  00 03 00 00 00 18  C  C  D  T 00 00 00 00 00 00
    0x80      128: ..............CC  ->  00 10 00 00 00 00 00 00 00 01 00 00 00 18  C  C
    0x90      144: DT..............  ->   D  T 00 00 00 00 00 00 00 01 00 00 00 00 00 00

I can see an embedded tiffs (exif metadata) at 296 and 688:

764 rmills@rmillsmbp:~/clanmills $ dmpf ~/Downloads/canon_eos_m50_01.cr3 | grep 'I  I' | head -3
   0x120      288: ....CMT1II*.....  ->  00 00 01 88  C  M  T  1  I  I  * 00 08 00 00 00
   0x2b0      688: II*.....'.......  ->   I  I  * 00 08 00 00 00  ' 00 9a 82 05 00 01 00
   0x6d0     1744: ...8CMT3II*.....  ->  00 00 14  8  C  M  T  3  I  I  * 00 08 00 00 00
765 rmills@rmillsmbp:~/clanmills $ 

And can extract them:

786 rmills@rmillsmbp:~/clanmills $ dd bs=1 skip=296 if=~/Downloads/canon_eos_m50_01.cr3 count=200000 | exiv2 -pa -
200000+0 records in
200000+0 records out
200000 bytes (200 kB) copied, 0.795933 s, 251 kB/s
Exif.Image.ImageWidth                        Short       1  6000
Exif.Image.ImageLength                       Short       1  4000
Exif.Image.BitsPerSample                     Short       3  8 8 8
Exif.Image.Compression                       Short       1  JPEG (old-style)
Exif.Image.Make                              Ascii       6  Canon
Exif.Image.Model                             Ascii      14  Canon EOS M50
Exif.Image.Orientation                       Short       1  left, bottom
Exif.Image.XResolution                       Rational    1  72
Exif.Image.YResolution                       Rational    1  72
Exif.Image.ResolutionUnit                    Short       1  inch
Exif.Image.DateTime                          Ascii      20  2018:02:21 12:00:56
Exif.Image.Artist                            Ascii       1  
Exif.Image.Copyright                         Ascii       1  
787 rmills@rmillsmbp:~/clanmills $ 
768 rmills@rmillsmbp:~/clanmills $ dd bs=1 skip=688 if=~/Downloads/canon_eos_m50_01.cr3 count=200000 | exiv2 -pa -
200000+0 records in
200000+0 records out
200000 bytes (200 kB) copied, 0.792246 s, 252 kB/s
Exif.Image.ExposureTime                      Rational    1  1/80 s
Exif.Image.FNumber                           Rational    1  F6.3
Exif.Image.ExposureProgram                   Short       1  Aperture priority
Exif.Image.ISOSpeedRatings                   Short       1  10000
Exif.Image.DateTimeOriginal                  Ascii      20  2018:02:21 12:00:56
Exif.Image.ShutterSpeedValue                 SRational   1  1/83 s
Exif.Image.ApertureValue                     Rational    1  F6.4
Exif.Image.ExposureBiasValue                 SRational   1  0 EV
Exif.Image.MeteringMode                      Short       1  Multi-segment
Exif.Image.Flash                             Short       1  No flash
Exif.Image.FocalLength                       Rational    1  45.0 mm
769 rmills@rmillsmbp:~/clanmills $ 

Here's the XMP:

556 rmills@rmillsmbp:/Applications $ dd bs=1 count=300 skip=$((28480+8)) if=~/Downloads/canon_eos_m50_01.cr3 2>/dev/null | xmllint --format -
<?xml version="1.0"?>
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="">
      <xmp:Rating>0</xmp:Rating>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
557 rmills@rmillsmbp:/Applications $ 

This project doesn't look painful. We'll have to write an image handler cr3image.cpp which will: 1) Detect the image type 2) cr3image->readMetadata() 3) cr3image->writeMetadata() 4) add to the test suite

It's probably similar to src/cr2image.cpp which is 231 lines of code.

clanmills commented 6 years ago

I've made some progress with this. I've looked at a couple of libraries for ISO BMFF support. I've the (Mac Only) ISOBMFF Explorer useful to understand those files: https://imazing.com/isobmff/download

I've discovered a one-file project which dumps ISO BMFF files. This looks like a great starting point and I've invited the author to join team Exiv2: https://github.com/pyke369/isobmffdump/issues/1

Here's output from the canon CR3 file:

749 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ isobmffdump  ~/Downloads/canon_eos_m50_01.cr3 
@0         | ftyp [24]
@24        | moov [28440]
@32        |   uuid [26216]
@26248     |   mvhd [108]
@26356     |   trak [484]
...
@28464     | uuid [65560]
@94024     | uuid [416007]
@510031    | mdat [38025680]
@38535711  | end
750 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 

The uuid box at 32 is the Canon magic. I don't know how to decode it at the moment.

The uuid box at 28464 is the XMP metadata and uses the same UUID as jp2image.cpp

754 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ grep -i kJp2UuidXmp ~/gnu/github/exiv2/exiv2/src/jp2image.cpp  | head -1
const unsigned char kJp2UuidXmp[]  = "\xbe\x7a\xcf\xcb\x97\xa9\x42\xe8\x9c\x71\x99\x94\x91\xe3\xaf\xac";
755 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 
@0         | ftyp [24]
@24        | moov [28440]
@32        |   uuid [26216]
               00000000  85 c0 b6 87 82 0f 11 e0 81 11 f4 ce 46 2b 6a 48  00 00 00 26 43 4e 43 56 43 61 6e 6f 6e 43 52 33  ............F+jH...&CNCVCanonCR3
               00000020  5f 30 30 31 2f 30 30 2e 30 39 2e 30 30 2f 30 30  2e 30 30 2e 30 30 00 00 00 5c 43 43 54 50 00 00  _001/00.09.00/00.00.00...\CCTP..
               00000040  00 00 00 00 00 01 00 00 00 03 00 00 00 18 43 43  44 54 00 00 00 00 00 00 00 10 00 00 00 00 00 00  ..............CCDT..............
               00000060  00 01 00 00 00 18 43 43 44 54 00 00 00 00 00 00  00 01 00 00 00 00 00 00 00 02 00 00 00 18 43 43  ......CCDT....................CC
               00000080  44 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00  00 03 00 00 00 5c 43 54 42 4f 00 00 00 04 00 00  DT...................\CTBO......
               000000a0  00 01 00 00 00 00 00 00 6f 30 00 00 00 00 00 01  00 18 00 00 00 02 00 00 00 00 00 01 6f 48 00 00  ........o0..................oH..
...
               00006640  0e 09 e4 73 cd 5d 3c a7 2d a5 2e 7a 74 52 66 53  cd b3 1a 8a d5 2a b7 f7 1f ff d9 00 00 a1 e0 10  ...s.]<.-..ztRfS.....*..........
@26248     |   mvhd [108]
@26356     |   trak [484]
...
@26840     |   trak [584]
...
@28464     | uuid [65560]
             00000000  be 7a cf cb 97 a9 42 e8 9c 71 99 94 91 e3 af ac  3c 3f 78 70 61 63 6b 65 74 20 62 65 67 69 6e 3d  .z....B..q......<?xpacket begin=
             00000020  27 ef bb bf 27 20 69 64 3d 27 57 35 4d 30 4d 70  43 65 68 69 48 7a 72 65 53 7a 4e 54 63 7a 6b 63  '...' id='W5M0MpCehiHzreSzNTczkc
             00000040  39 64 27 3f 3e 3c 78 3a 78 6d 70 6d 65 74 61 20  78 6d 6c 6e 73 3a 78 3d 22 61 64 6f 62 65 3a 6e  9d'?><x:xmpmeta xmlns:x="adobe:n
             00000060  73 3a 6d 65 74 61 2f 22 3e 3c 72 64 66 3a 52 44  46 20 78 6d 6c 6e 73 3a 72 64 66 3d 22 68 74 74  s:meta/"><rdf:RDF xmlns:rdf="htt
             00000080  70 3a 2f 2f 77 77 77 2e 77 33 2e 6f 72 67 2f 31  39 39 39 2f 30 32 2f 32 32 2d 72 64 66 2d 73 79  p://www.w3.org/1999/02/22-rdf-sy
             000000a0  6e 74 61 78 2d 6e 73 23 22 3e 3c 72 64 66 3a 44  65 73 63 72 69 70 74 69 6f 6e 20 72 64 66 3a 61  ntax-ns#"><rdf:Description rdf:a
             000000c0  62 6f 75 74 3d 22 22 20 78 6d 6c 6e 73 3a 78 6d  70 3d 22 68 74 74 70 3a 2f 2f 6e 73 2e 61 64 6f  bout="" xmlns:xmp="http://ns.ado
             000000e0  62 65 2e 63 6f 6d 2f 78 61 70 2f 31 2e 30 2f 22  3e 3c 78 6d 70 3a 52 61 74 69 6e 67 3e 30 3c 2f  be.com/xap/1.0/"><xmp:Rating>0</
             00000100  78 6d 70 3a 52 61 74 69 6e 67 3e 3c 2f 72 64 66  3a 44 65 73 63 72 69 70 74 69 6f 6e 3e 3c 2f 72  xmp:Rating></rdf:Description></r
             00000120  64 66 3a 52 44 46 3e 3c 2f 78 3a 78 6d 70 6d 65  74 61 3e 20 20 20 20 20 20 20 20 20 20 20 20 20  df:RDF></x:xmpmeta>             
             00000140  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20                                  
             00000160  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20                                  
...
@510031    | mdat [38025680]
@38535711  | end

There's another uuid box at 94024. I don't know what that is yet. It's not Iptc:

760 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ isobmffdump --dump uuid ~/Downloads/canon_eos_m50_01.cr3 | grep uuid
@32        |   uuid [26216]
@28464     | uuid [65560]
@94024     | uuid [416007]
761 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 
clanmills commented 6 years ago

I've raised a discussion thread on Pixls.us to see if somebody knows the specification for those unidentified uuid box structures. https://discuss.pixls.us/t/new-canon-cr3-file-specification/6881

As I've mentioned in on Pixls.us, I have a (retired) Canon friend who may be able to help. Canon is a huge company and it's likely that my friend will have no contacts with digital camera software engineering.

LebedevRI commented 6 years ago

I too was able to make progress. It's strange, those raws are demosaiced already :/ https://github.com/darktable-org/rawspeed/issues/121

clanmills commented 6 years ago

Good News. I don't think this is going to be very difficult. The file is ISO BMFF format. Almost all the data in the file is in the MDAT (which I thought was intended for audio/music). I thought ISO BMFF was intended for video. Confused? (I'm easily confused).

I confirm that the MDAT "appears" to be a plain 38mb JPEG 4000x6000 pixels.

513 rmills@rmillsmbp:~/gnu/github $ dd bs=1 skip=$((510031+16)) count=$((38025680-16)) if=~/Downloads/canon_eos_m50_01.cr3 > foo.jpg
38025664+0 records in
38025664+0 records out
38025664 bytes (38 MB) copied, 418.962 s, 90.8 kB/s
514 rmills@rmillsmbp:~/gnu/github $ exiv2 foo.jpg 
File name       : foo.jpg
File size       : 38025664 Bytes
MIME type       : image/jpeg
Image size      : 6000 x 4000
foo.jpg: No Exif data found in the file
515 rmills@rmillsmbp:~/gnu/github $ 

This comment is relevant: https://github.com/darktable-org/rawspeed/issues/121#issuecomment-370252299 From Exiv2's point of view, the MDAT previews are interesting.

The Exif (and IPTC) metadata is almost certainly in the Canon uuid box near the top of the file. We have already identified the uuid box with the XMP metadata.

So, we're making good progress.

LebedevRI commented 6 years ago

I confirm that the MDAT "appears" to be a plain 38mb JPEG 4000x6000 pixels.

As per https://github.com/darktable-org/rawspeed/issues/121#issuecomment-370251636, what you are looking at, is the largest embedded thumbnail, which exiv2 should ideally be able to provide via the usual means.

$ ffprobe canon_eos_m50_01.cr3 
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'canon_eos_m50_01.cr3':
  Metadata:
    major_brand     : crx 
    minor_version   : 1
    compatible_brands: crx isom
    creation_time   : 2018-02-21T12:00:56.000000Z
  Duration: 00:00:01.00, start: 0.000000, bitrate: 308285 kb/s
    Stream #0:0(eng): Video: none (CRAW / 0x57415243), none, 6000x4000, 26169 kb/s, 1 fps, 1 tbr, 1 tbn, 1 tbc (default)      <- THIS ONE
    Metadata:
      creation_time   : 2018-02-21T12:00:56.000000Z
    Stream #0:1(eng): Video: none (CRAW / 0x57415243), none, 1624x1080, 14829 kb/s, 1 fps, 1 tbr, 1 tbn, 1 tbc (default)      <- LIKELY TOO
    Metadata:
      creation_time   : 2018-02-21T12:00:56.000000Z
    Stream #0:2(eng): Video: none (CRAW / 0x57415243), none, 6288x4056, 262877 kb/s, 1 fps, 1 tbr, 1 tbn, 1 tbc (default)   <- has to be raw, dimensions/bitrate look about right...
    Metadata:
      creation_time   : 2018-02-21T12:00:56.000000Z
    Stream #0:3(eng): Data: none (CTMD / 0x444D5443), 328 kb/s (default)
    Metadata:
      creation_time   : 2018-02-21T12:00:56.000000Z

The second one is also a thumbnail, likely. And only the third one is the raw data chunk

clanmills commented 6 years ago

Progress with decoding the Canon uuid box. It's a linked list of records:

893 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ ./isobmffdump ~/Downloads/canon_eos_m50_01.cr3 | head -5
@0         | ftyp [24]
@24        | moov [28440]
@32        |   uuid [26216]
@26248     |   mvhd [108]
@26356     |   trak [484]

I can extract the uuid with this: (24 = 16 bytes for the UUID + 8 bytes for the UUID marker)

894 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ dd skip=$((32+24)) count=$((26216-24)) bs=1 if=~/Downloads/canon_eos_m50_01.cr3 > canon.uuid
26192+0 records in
26192+0 records out
26192 bytes (26 kB) copied, 0.261663 s, 100 kB/s

I've written a utility dumper.cpp (code below):

970 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ ./dumper canon.uuid 
dumping canon.uuid
 offset| length| data
      4|     34| CNCVCanonCR3_001/00.09.00/00.00.00
     42|     88| CCTP................CCDT....................CCDT................ _._
    134|     88| CTBO..............o0..................oH......Y............O.... _._
    226|      6| free..
    236|    388| CMT1II*...............p......................................... _._
    628|   1060| CMT2II*.....'........................."...........'........'..0. _._
   1692|   5172| CMT3II*...../.....1...B..............................."......... _._
   6868|   1812| CMT4II*......................................................... _._
   8684|  17508| THMB.......x..DK................................................ _._
971 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 

CTM1,2,3,4 looks like tiffs. Exif metadata probably. Why 4? Don't know yet. THMB is probably a thumbnail (JPEG perhaps)?

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <iostream>

static bool isBigEndianPlatform()
{
    union {
        uint32_t i;
        char c[4];
    } e = { 0x01000000 };

    return e.c[0]?true:false;
}

// https://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func
static int32_t bigEndianToInt(int32_t num)
{
    return isBigEndianPlatform()
              ? num
              : ((num>>24)&0xff)       // move byte 3 to byte 0
              | ((num<<8)&0xff0000)    // move byte 1 to byte 2
              | ((num>>8)&0xff00)      // move byte 2 to byte 1
              | ((num<<24)&0xff000000) // byte 0 to byte 3
              ;
}

void dump(void* buffer,int32_t length)
{
    bool bEarly = false;
    if ( length > 64 ) {
        bEarly = true;
        length = 64 ;
    }
    for ( int32_t i = 0 ; i < length ; i++ ) {
        int c = (int) ((char*)buffer) [i];
        printf("%c", ((c<32) || (c>=128)) ? '.' : c );
    }
    if ( bEarly ) printf(" _._");
    printf("\n");
}

int main(int argc, const char* argv[])
{
    int         result   = 0;
    FILE*       f        = NULL;
    const char* program  = argv[0];
    const char* filename = argv[1];

    if ( argc == 2 ) {
        f = fopen(filename,"rb");
        if ( !f ) {
            fprintf(stderr,"unable to open %s\n",filename);
            result = 1;
        }
    } else {
        fprintf(stderr,"syntax: %s file\n",program);
    }

    if ( f ) {
        printf("dumping %s\n",filename);
        printf(" offset| length| data\n");
        int32_t length ;
        do {
            length=0;
            if ( fread(&length,1,4,f) == 4 ) {
                length=bigEndianToInt(length);
                if ( length > 4 ) {
                    length -= 4;
                    printf("%7ld|%7d| ",ftell(f),length);
                    void* buffer = ::malloc(length);
                    fread(buffer,length,1,f);
                    dump(buffer,length);
                    ::free(buffer);
                }
                // fseek(f,(length-4),SEEK_CUR);
            }
        } while (length > 0);
    }

    return result;
}
LebedevRI commented 6 years ago

CTM1,2,3,4 looks like tiffs. Exif metadata probably. Why 4? Don't know yet.

Well, we do have 3 actual Streams in the mdat, so one per each of those + one general IFD?

clanmills commented 6 years ago

Good Morning, Roman. Hope the weather's warmer for you this week.

I'm going to leave you to investigate the MDAT because I believe it's a JPEG with no metadata. I don't see an IFD in the MDAT. If you know how to locate that, I'll investigate that after I understand the moov/uuid. And we've already identified the XMP.

I'm focused on the Canon uuid box in the moov. I've run out of time for today. I hope tomorrow morning to know enough to read the metadata.

LebedevRI commented 6 years ago

(I was talking about the fact that the CR3 ISO BMFF has exactly 4 trak boxes, so my guess would be that it's for each of those)

clanmills commented 6 years ago

Thanks, Roman. We'll figure this puzzle. 4 trak boxes. Yes - that's right.

Last discovery for Monday morning before I get on with what I should be doing! The THMB record contains a JPEG 120x160 thumbnail.

It's 20 bytes into the record. Why 20? 4 for THMB followed by 16 bytes. For extracting thumbnails, we never need to know. Exiv2 has never been able to edit thumbnails. Exiv2 can relocate THMB in the file as a binary blob.

919 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ dd bs=1 skip=$((8684+20)) count=$((17508-20)) if=canon.uuid > canon.jpg
17488+0 records in
17488+0 records out
17488 bytes (17 kB) copied, 0.177006 s, 98.8 kB/s
920 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiv2 canon.jpg 
File name       : canon.jpg
File size       : 17488 Bytes
MIME type       : image/jpeg
Image size      : 160 x 120
canon.jpg: No Exif data found in the file
921 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 
clanmills commented 6 years ago

Phil Harvey (ExifTool) knows about .CR3 files:

995 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiftool ~/Downloads/canon_eos_m50_01.cr3 
ExifTool Version Number         : 10.82
...
Compatible Brands               : crx , isom
Compressor Version              : CanonCR3_001/00.09.00/00.00.00
Image Width                     : 6000
Image Height                    : 4000
Bits Per Sample                 : 8 8 8
Compression                     : JPEG (old-style)
...

There is also helpful document. See "Canon uuid Tags" on this page: https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html

Tag ID Values / Notes Comment
CCTP CanonCCTP
CMT1 EXIF Tags
CMT2 EXIF Tags
CMT3 Canon Tags https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html
CMT4 Canon Unknown IFD
CNCV Compressor Version
CNTH Canon CNTH Tags
THMB Thumbnail Image
clanmills commented 6 years ago

I think I've got a good understanding about the scope of this project. We're having a team meeting in May #225 when we'll discuss the specification and schedule for v0.27.

clanmills commented 6 years ago

I've written code to search for tiff structures in a binary file. It's "simple minded" and only searches for the pattern **_II*\0_** or **_MM\0*_** and reports the location. Discovered tiffs can be extracted with the utility dd. Although there is binary data with the **_II*\0_** finger-print in MDAT, none of these structures is a valid tiff. It's just a coincidence.

615 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ tiffs ~/Downloads/canon_eos_m50_01.cr3 
searching /Users/rmills/Downloads/canon_eos_m50_01.cr3
    offset | type
       296 | II
       688 | II
      1752 | II
      6906 | II
      6928 | II
  38494795 | II
  38494825 | II
  38496517 | II
  38534235 | II
616 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $

Here's the code:

// tiffs.cpp - search for tiff structures in a binary file
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <iostream>

int main(int argc, const char* argv[])
{
    int         result   = 0;
    FILE*       f        = NULL;
    const char* program  = argv[0];
    const char* filename = argv[1];

    if ( argc == 2 ) {
        f = fopen(filename,"rb");
        if ( !f ) {
            fprintf(stderr,"unable to open %s\n",filename);
            result = 1;
        }
    } else {
        fprintf(stderr,"syntax: %s file\n",program);
    }

    if ( f ) {
        printf("searching %s\n",filename);
        printf("    offset | type\n");
        while ( !feof(f) ) {
            int32_t count=0;
            char    buff[4];
            char    II[4];
            char    MM[4];
            memcpy(II,"II*\0",4);
            memcpy(MM,"MM\0*",4);

            size_t now = ftell(f);
            while (  fread(buff+count,1,1,f) == 1
                &&( (memcmp(MM,buff,count+1) == 0) || (memcmp(II,buff,count+1) == 0))
                  ) {
                count++;
                if ( count ==  4 ) {
                    printf("%10ld | %c%c\n",now,buff[0],buff[1]);
                    break;
                }
            }
            if ( count > 1 ) {
                fseek(f,count,SEEK_CUR);
            }
        }
        fclose(f);
        f=NULL;
    }

    return result;
}
lclevy commented 6 years ago

In mdat, to be more accurate, there are IFDs, that are parsed by ExifTool 10.82

clanmills commented 6 years ago

Here's a snippet from the -verbose out from ExifTool.

621 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiftool -verbose ~/Downloads/canon_eos_m50_01.cr3

CTMD (SubDirectory) -->

629 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ tiffs ~/Downloads/canon_eos_m50_01.cr3 
searching /Users/rmills/Downloads/canon_eos_m50_01.cr3
    offset | type
       296 | II
       688 | II
      1752 | II
      6906 | II
      6928 | II
  38494795 | II  <--- this is a valid tiff
  38494825 | II
  38496517 | II
  38534235 | II
630 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ dd bs=1 skip=38494795 count=10000 if=~/Downloads/canon_eos_m50_01.cr3 > foo1.tif
10000+0 records in
10000+0 records out
10000 bytes (10 kB) copied, 0.11086 s, 90.2 kB/s
631 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ exiv2 -pS foo1.tif 
STRUCTURE OF TIFF FILE (II): foo1.tif
 address |    tag                           |      type |    count |    offset | value
      10 | 0xa406 SceneCaptureType          |     SHORT |        1 |         0 | 0
END foo1.tif
632 rmills@rmillsmbp:~/gnu/github/isobmff/pyke369/isobmffdump $ 
clanmills commented 6 years ago

@lclevy @LebedevRI Thanks Guys. I'm mistaken. There is a valid tiff in the mdat. I still don't have a complete understanding of how all the data is organised in CR3, however we're making good progress. And I'm enjoying working with you both. Thanks.

boardhead commented 6 years ago

The CTMD type 1, 4 and 5 directories contain timestamp, focal-length and exposure information respectively. ExifTool 10.87 will decode these too. I still don't know about type 3 though.

lclevy commented 6 years ago

Hi Phil, Thanks for the info. I have made progress on the craw proprietary format, and found cr3 support in dpp 4.8.20. Kind regards Laurent

⁣Envoyé par Blue ​

Le 19 mars 2018 à 16:08, à 16:08, Phil Harvey notifications@github.com a écrit:

The CTMD type 1, 4 and 5 directories contain timestamp, focal-length and exposure information respectively. ExifTool 10.87 will decode these too. I still don't know about type 3 though.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Exiv2/exiv2/issues/236#issuecomment-374245697

clanmills commented 6 years ago

Could you share your code please, Laurent? I was working on my decoder (tiffs.cpp) on Friday and will publish a new version here later this week. https://github.com/Exiv2/exiv2/issues/236

LebedevRI commented 6 years ago

I have made progress on the craw proprietary format, and found canon code which handle it, in dpp 4.8.20. Do you know someone else working on it ? So that we can work together? I would to create an open source decoder.

Is there a specification? As you are well aware, i'm kinda working on it in https://github.com/darktable-org/rawspeed/issues/121

clanmills commented 6 years ago

dpp 4.8.20 from Canon Asia = http://support-th.canon-asia.com/contents/TH/EN/0200544802.html

I haven't downloaded this. Looks like an executable binary.

lclevy commented 6 years ago

It allows conversion from cr3 files to tiff and JPEG, as well as displaying cr3 metadata

⁣Envoyé par Blue ​

Le 19 mars 2018 à 16:29, à 16:29, Robin Mills notifications@github.com a écrit:

dpp 4.8.20 from Canon Asia = http://support-th.canon-asia.com/contents/TH/EN/0200544802.html

I haven't downloaded this. Looks like an executable binary.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Exiv2/exiv2/issues/236#issuecomment-374253443

clanmills commented 6 years ago

Gentlemen: DPP4 is an impressive application. It does have a metadata dialog box (View/Info) with tabs. Curiously, under tab "IPTC/XMP" is shows categories with no meaningful data. Maybe the camera firmware engineers didn't tell the DPP Developers about the uuid for XMP! In the test file, the XMP only has "Rating 0". Anyway good progress. I'll spend more time looking at this later in the week.

clanmills commented 6 years ago

There is a Canon Digital Camera SDK available in SE Asia. I'd have to sign an NDA to obtain that, so as Exiv2 is licensed under GPLv2, I suspect the NDA would cause me trouble. I feel we're close to unravelling the .CR3 format. I'll spend time on this later in the week and give everybody an update on my progress.

lclevy commented 6 years ago

updated: https://github.com/lclevy/canon_cr3 (only by examining samples)

lclevy commented 6 years ago

the hardest part is understanding the compression algorithm. not sure it is possible

clanmills commented 6 years ago

@lclevy Nice job of updating your "spec" for CR3.

VioletGiraffe commented 5 years ago

Any news / progress on CR3 support as far as reading metadata?

clanmills commented 5 years ago

I don't think Exiv2 will be able to work on this during the next 12 months. I'm delighted to say that we now have 6 regular contributors to Exiv2 and that's much better than the years I worked alone. However 4 of the guys are involved in "modernisation" of the code to C++11 for v0.28. Nehal's dealing with our servers and I've decided to stop writing code (I'm 68) and deal with releases and users. We will have regular (quarterly) dot releases of v0.27 during 2019 and 2020. So, until v0.28 is finished (sometime in 2020), it's unlikely that we can work on this.

VioletGiraffe commented 5 years ago

@clanmills, thank you for the quick reply! I'm very sad to hear this as it means a major setback for our project, but I really appreciate all your hard work.

lclevy commented 5 years ago

You can use Exiftool or parse_cr3.py https://github.com/lclevy/canon_cr3/blob/master/parse_cr3.py

clanmills commented 5 years ago

@VioletGiraffe I'm happy to mentor someone, if you know a volunteer. I haven't studied the python code written by @lclevy however we can use that both as a code model and for testing our parser.

boardhead commented 5 years ago

This would be a perfect project for a Google Summer of Code person.

clanmills commented 5 years ago

Exiv2 had GSoC students in 2012 (Abhinav) and 2013 (Tuan and Mahesh). In 2014, I retired and moved house back from Silicon Valley to England. Most the next 3 years were spent remodelling our house and working on Exiv2.

We applied this year (2019) to GSoC because the house project is complete, and I'm delighted to have recruited 5 contributors to Exiv2. Sadly, Google did not not allocate a student: https://github.com/Exiv2/exiv2/issues/659

My wife was delighted because she thinks I spend way too much time on Exiv2!

I have downloaded Laurent's beautiful python CR3 parser:

578 rmills@rmillsmbp:~/gnu/github/canon_cr3-master $ python3 parse_cr3.py samples/IMG_0482_raw.CR3 
filesize 0x192f495
base=0
00000:ftyp: major_brand=b'crx ', minor_version=1, [b'crx ', b'isom'] (0x18)
00018:moov: (0x5980)
base=20
00020:  uuid: b'85c0b687820f11e08111f4ce462b6a48' (0x50d0)
base=38
00038:    CNCV: b'CanonCR3_001/00.09.00/00.00.00' (0x26)
0005e:    b'CCTP' b'000000000000000100000003000000184343445400000000' (0x5c)
...

My priority at the moment is to release Exiv2 v0.27.1 (hopefully today). I suspect, thanks to the code by @lclevy porting this to C++ in Exiv2 is not an overwhelming task. However, without an engineer to take ownership of this, I can't see this happening in the next 12 months.

CarVac commented 5 years ago

LibRaw released a snapshot yesterday with CR3 support, and now this is the blocker for Filmulator supporting .CR3 cameras.

Is there any way I can help out?

mrkkrp commented 5 years ago

@clanmills Can you provide some guidance for those who I'd like to try to implement this? A plan/outline? Note that @CarVac asks if they can help.

clanmills commented 5 years ago

Mark. I have retired from Exiv2. However, I admire your enthusiasm for this task and I had hoped that I would get to this before I retired.

So I'm happy to mentor you and @carVac Can we discuss this by email: robin@clanmills.com When you're ready to submit a PR, you can summarise the implementation for several reasons: 1) To help Team Exiv2 with the review of your changes. 2) To act as a 'road map' for other file types being added to Exiv2. 3) It'll help you be quite certain and clear in your mind about what you did (and why).

When would you like to start?

mrkkrp commented 5 years ago

@CarVac would you like to give it a go or should I try to dig it? If C++ is fresh in your head and most importantly if you have the time you may be a better fit for this.

CarVac commented 5 years ago

I'm fresh on C++ but I'm in the process of moving so I can't do it right now. But I really would like to follow along in the process (in whatever email conversations you have).

clanmills commented 5 years ago

I know the Exiv2/C++ code very well. I downloaded the LibRaw code a couple of months ago at the suggestion of @lclevy. However I was pushing hard to release Exiv2 v0.27.3 at that time and didn't look at it. I'll look in the next few days. I have gardening project going on this week and, as I'm paying people to work with me, I need to deal with that first. I should have time to work on this next week.

It would be helpful if @CarVac could let us know about his skills, knowledge of C++, LibRaw, Exiv2, .CR3 - or anything that he thinks will be useful.

clanmills commented 5 years ago

@CarVac I don't think this is a task for somebody "fresh on C++", however you could help with testing. Do you know Python3?

CarVac commented 5 years ago

I meant "fresh" as in it's fresh in my mind. I write Filmulator (https://github.com/CarVac/filmulator-gui) in C++, and it's basically the way I learned the language. By contrast, I don't really know Python at all anymore; I learned back on 2.7 and haven't used it since.

My main skill in programming is working memory: given familiarity, I can grasp a very large amount of a codebase.

I know basically nothing about the inner workings of Exiv2 or CR3, and only a slight amount about the structure of LibRaw. But I definitely want to learn; part of this is so that I can better understand how to approach handling EXIF in Filmulator, which uses the crudest possible method of "copy all the exif from the raw to the output file".

clanmills commented 5 years ago

@CarVac @mrkkrp I've changed my mind. I have retired and don't want to get involved. I'm sorry I offered to help. I'm very happy to have the retired from Exiv2 after 12 years and 10,000+ hours of effort. I wish you every success if you take on this challenge.

boardhead commented 5 years ago

@clanmills A well-deserved break. The effort that you put into this project is impressive.