Exiv2 / exiv2

Image metadata library and tools
http://www.exiv2.org/
Other
893 stars 279 forks source link

Exiv2 is slow on video files #1724

Closed maxpozdeev closed 2 years ago

maxpozdeev commented 3 years ago

Sometimes I use exiv2 to read metadata from large .MOV video files (many gigabytes and/or networked volumes) (just to check this is not an image). This can take a long time due to the whole file is read. Maybe exiv2 should ignore BMFF files with video brands?

Exiv2 0.27.4 from repo with EXIV2_ENABLE_BMFF=On (on macOS)

hassec commented 3 years ago

Hi @maxpozdeev thanks for the report! :+1:

I think that's a bug... :see_no_evil:

On a random *.MOV file I get the following

File size       : 51844028 Bytes
MIME type       : image/generic
Image size      : 0 x 0
IMG_0667.MOV: No Exif data found in the file

But I'd say returning an image/generic MIME type is not great...

I agree, that this should abort earlier instead of trying to parse the entire file as a BMFF image.

Until this is fixed, maybe just do your check by for example running the file command to make sure it's not a MOV file?

maxpozdeev commented 3 years ago

I used a quick fix, to check if the media type is a supported image. I think there will be more and more media formats (video and images) which use BMFF as their container. So maybe Exiv2 should strictly check for supported image formats.

diff --git a/include/exiv2/bmffimage.hpp b/include/exiv2/bmffimage.hpp
index 8bc160fe..d819acbc 100644
--- a/include/exiv2/bmffimage.hpp
+++ b/include/exiv2/bmffimage.hpp
@@ -32,6 +32,7 @@
 namespace Exiv2
 {
     EXIV2API bool enableBMFF(bool enable = true);
+    EXIV2API bool enableOnlySupportedBMFF(bool enable = true);
 }

 #ifdef EXV_ENABLE_BMFF
diff --git a/src/bmffimage.cpp b/src/bmffimage.cpp
index e22979d0..2b144ce8 100644
--- a/src/bmffimage.cpp
+++ b/src/bmffimage.cpp
@@ -89,9 +89,17 @@ struct BmffBoxHeader
 namespace Exiv2
 {
     static bool   enabled = false;
+    static bool   onlySupportedFiles = false;
     EXIV2API bool enableBMFF(bool enable)
     {
         enabled = enable ;
+        onlySupportedFiles = false ;
+        return true ;
+    }
+    EXIV2API bool enableOnlySupportedBMFF(bool enable)
+    {
+        enabled = enable ;
+        onlySupportedFiles = enable ;
         return true ;
     }

@@ -639,6 +647,18 @@ namespace Exiv2

         bool matched = (buf[4] == 'f' && buf[5] == 't' && buf[6] == 'y' && buf[7] == 'p')
                      ||(buf[4] == 'J' && buf[5] == 'X' && buf[6] == 'L' && buf[7] == ' ');
+
+        //TODO: Good idea to test compatile brands as well
+        matched = onlySupportedFiles && matched && ( 
+                               (buf[8] == 'a' && buf[9] == 'v' && buf[10] == 'i')                       //avi[f] - AVIF types
+                             ||(buf[8] == 'm' && buf[9] == 'i' && buf[10] == 'f' && buf[11] == '1')     //mif1 - image    - any coding
+                             ||(buf[8] == 'm' && buf[9] == 's' && buf[10] == 'f' && buf[11] == '1')     //msf1 - sequence - any coding
+                             ||(buf[8] == 'm' && buf[9] == 'i' && buf[10] == 'a' && buf[11] == 'f')     //miaf - just for compatibility
+                             ||(buf[8] == 'h' && buf[9] == 'e' && buf[10] == 'i')                       //hei[c] - image    - HEVC
+                             ||(buf[8] == 'h' && buf[9] == 'e' && buf[10] == 'v')                       //hev[c] - sequence - HEVC
+                             ||(buf[8] == 'c' && buf[9] == 'r' && buf[10] == 'x' && buf[11] == ' ')     //Canon CR3
+                             ||(buf[8] == 'J' && buf[9] == 'X' && buf[10] == 'L' && buf[11] == ' ') );  //JPEG XL
+
         if (!advance || !matched) {
             iIo.seek(static_cast<long>(0), BasicIo::beg);
         }
hassec commented 3 years ago

Hi @maxpozdeev I've created #1745 to fix this problem, but will wait for some more feedback from other maintainers, as I'm not sure what the best way to handle this will be.

I'm not a huge fan of asserting that exvi2 knows the brand, as that isn't a requirement for it to parse a BMFF file. It will just label it as image/generic mime type but still parse the file and report metadata if found.

1div0 commented 3 years ago

MPEG-4 is just big mess. We could add some additional parsing in order to avoid such corner case.

hassec commented 2 years ago

fixed in #1745