GreycLab / CImg

The CImg Library is a small and open-source C++ toolkit for image processing
http://cimg.eu
Other
1.46k stars 278 forks source link

Undefined behaviour and correctness discrepancy while parsing a BMP file #394

Open m-carrasco opened 11 months ago

m-carrasco commented 11 months ago

Hi CImg :wave:

We found an undefined behaviour bug in CImg triggered when parsing a BMP file. The BMP is correct according to CImg; no exceptions or crashes when parsed without sanitisers. However, ImageMagick's identify flags the BMP as incorrect.

After debugging, we understand this correctness discrepancy is triggering the undefined behaviour.

Bug

CImg.h:54462:39: runtime error: signed integer overflow: 1048616 - -2147483608 cannot be represented in type 'int'

The UB happens in function CImg<T>& _load_bmp(std::FILE *const file, const char *const filename) and the line is:

const int xoffset = offset - 14 - header_size - 4*nb_colors;

offset has a value of 1048630 and header_size of -2147483608 while file size is 238.

Possible fix

offset and header_size could be compared to file_size to check that they are within range.

If this sounds reasonable, I can provide a PR for it.

How to reproduce:

#include "CImg.h"

using namespace cimg_library;

int main(int argc,char **argv) {
  const char *const file_i = cimg_option("-i",(char*)0,"Input image");  
  CImg<unsigned char> image(file_i);
  return 0;
}

testcase.zip (Github does not allow uploading BMP files)

This has been tested with clang++-14.

unzip testcase.zip testcase.bmp
clang++ -g -O0 -fsanitize=undefined,address -fno-sanitize-recover=all -Dcimg_display=0 main.cpp
./a.out -i testcase.bmp

Discrepancy with identify

identify -verbose testcase.bmp reports

identify-im6.q16: length and filesize do not match testcase.bmp @ error/bmp.c/ReadBMPImage/854. 
identify-im6.q16: insufficient image data in file testcase.bmp @ error/bmp.c/ReadBMPImage/1001.

We have discovered additional BMP files triggering discrepancies with identify, like above, meaning that CImg may wrongly consider them correct. Would you find it useful if we report them?

Thanks for sharing this project!

Best, Manuel.

dtschump commented 11 months ago

Thanks for reporting. I've made a fix attempt : https://github.com/GreycLab/CImg/commit/55da1f4dc8256aa8846036a45d5dbb2bab1466c2 Could you tell me if that looks good?

m-carrasco commented 11 months ago

Thanks for the quick response!

I think a fix should take this into account:

  1. A malformed BMP can provide at least inconsistent values for the offset, header_size and file_size variables. Those variables can come from the input file's content.
  2. A wrong combination of offset and header_size can lead to the reported UB. To fix this, we could check if they are in range with the file's size.

I would propose a fix which performs these checks in this particular order:

  1. Sanitize file_size so it matches the actual file size.
  2. Check that offset and header_size are consistent with the now sanitized file_size.

From what I grasp, the fix is not sanitizing file_size. Also, it attempts to sanitize header_size using an unsanitized file_size. Last, it is not checking offset.

Here is my fix (on top of yours):

diff --git a/CImg.h b/CImg.h
index b5a934d..21db640 100644
--- a/CImg.h
+++ b/CImg.h
@@ -54419,6 +54419,11 @@ namespace cimg_library {

       const ulongT fsiz = file?(ulongT)cimg_max_buf_size:(ulongT)cimg::fsize(filename);
       std::FILE *const nfile = file?file:cimg::fopen(filename,"rb");
+      
+      cimg::fseek(nfile,0,SEEK_END);
+      cimg_long filesystem_size=cimg::ftell(nfile);
+      std::rewind(nfile);
+
       CImg<ucharT> header(54);
       cimg::fread(header._data,54,nfile);
       if (*header!='B' || header[1]!='M') {
@@ -54440,17 +54445,25 @@ namespace cimg_library {
         nb_colors = header[0x2E] + (header[0x2F]<<8) + (header[0x30]<<16) + (header[0x31]<<24),
         bpp = header[0x1C] + (header[0x1D]<<8);

+      if (file_size != filesystem_size){
+        throw CImgIOException(_cimg_instance
+                              "load_bmp(): Invalid file_size %i specified in filename '%s', expected %i.",
+                              cimg_instance,
+                              file_size,filename?filename:"(FILE*)", filesystem_size);
+      }
+
       if (header_size<0 || header_size>file_size)
         throw CImgIOException(_cimg_instance
                               "load_bmp(): Invalid header size %d specified in filename '%s'.",
                               cimg_instance,
                               header_size,filename?filename:"(FILE*)");

-      if (!file_size || file_size==offset) {
-        cimg::fseek(nfile,0,SEEK_END);
-        file_size = (int)cimg::ftell(nfile);
-        cimg::fseek(nfile,54,SEEK_SET);
-      }
+      if (offset<0 || offset>file_size)
+        throw CImgIOException(_cimg_instance
+                              "load_bmp(): Invalid offset %i specified in filename '%s'.",
+                              cimg_instance,
+                              offset,filename?filename:"(FILE*)");
+
       if (header_size>40) cimg::fseek(nfile,header_size - 40,SEEK_CUR);

       const int

From what I tested, this fixes the UB and should also fix the first reported discrepancy with identify.


The second discrepancy with ImageMagick seems to be about the image pixels' data. ImageMagick ensures that the total number of pixels read is consistent with the width and height.

I could also try to provide a patch if you like.


Best, Manuel.

dtschump commented 11 months ago

Thanks. Here is my proposal then :

https://github.com/GreycLab/CImg/commit/009f0432fd32655f9399f2ec2fd5380c44e49a8e

Variable fsize actually already estimates the file size, with the std::tell() procedure.

m-carrasco commented 11 months ago

Thanks!

It looks good to me for the case in which a file path is provided instead of a FILE*.

The issue is with fsiz is that it is computed as const ulongT fsiz = file?(ulongT)cimg_max_buf_size:(ulongT)cimg::fsize(filename);.

If a file pointer is provided (file is not NULL), you could still have issues similar to those reported here. fsiz is set to cimg_max_buf_size, which could be higher than the actual size, thus allowing incorrect values for offset and header_size again. cimg_max_buf_size seems high enough to allow the UB to happen again.

Best, Manuel.

dtschump commented 11 months ago

Indeed, so https://github.com/GreycLab/CImg/commit/703ffcaf470d426e333f8349760957119590d38a should help.

m-carrasco commented 11 months ago

Yes, it looks good to me in regard to the UB :+1:

A minor comment would be that the new fsize is not restoring the original file position. In our case, this is not an issue but it may be if used elsewhere.

I would save the position at the beginning long originalPos = std::ftell(file); and restore before returning std::fseek(file, originalPos, SEEK_SET);

Best, Manuel.

dtschump commented 11 months ago

Indeed, thanks ! I've made the changes.

Regards,

David.