Exiv2 / exiv2

Image metadata library and tools
http://www.exiv2.org/
Other
924 stars 281 forks source link

out-of-bounds array access in RemoteIo::getb #910

Closed branewave closed 5 years ago

branewave commented 5 years ago

When RemoteIo::getb is called for a byte at the very end of a block, it will populate the wrong block, and then read a byte from before the start of the "data" array. The fix is to remove the "+ 1" from the "expectedBlock" calculation.

size_t expectedBlock = (p_->idx_ + 1)/p_->blockSize_;

clanmills commented 5 years ago

Thanks for reporting this. And thanks for proposing a fix. I worked with Tuan (an outstanding GSoC student at NUS in Singapore) on RemoteIo in 2013. I seem to recall discussing this line of code with him!

I will investigate this for 0.27.3 which is scheduled for release on 2019-09-30.

clanmills commented 5 years ago

Typically blockSize == 1024 and idx_ is the index of the stream. The result is

return data[p_->idx_++ - expectedBlock*p_->blockSize_];

When the index is 1023, we need block [0].

data[1023];

When the index is 1024, we need block [1].

data[1024 - 1*1024]; == data[0]

I modified blockSize_ to be 1 and tried simple tests such as:

$ bin/exiv2 -pa http://clanmills.com/Stonehenge.jpg

Oddly, everything seems to be working with/without your fix! (with http/https/curl/http.cpp). I think the reason is because RemoteIo::getb() isn't used to parse images. I believe the code in image->readMetadata() uses RemoteIo::read(). I am currently writing a book Image Metadata and Exiv2 Architecture. I'll visit this again when I write the chapter: 7 I/O in Exiv2 which will discuss BasicIo in detail.

The suggested fix seems sound to me and I have submitted PR #1007 for team review and approval.

clanmills commented 5 years ago

Very pleased to confirm that getb() is correct with this change. I've written a new IO test program which will be published in the book. The essence of this is a copy function which reads from http to file as follows:

static int cp (int argc,const char** argv,const Options& options)
{
    // printf("cp argc = %d ",argc); for ( int i = 0 ; i < argc ; i++ ) printf("%s ", argv[i]); printf("\n");
    int rc = 0 ;
    Exiv2::BasicIo::AutoPtr pIn = Exiv2::ImageFactory::createIo(argv[0],true);

    if ( rc == 0 && pIn->open() != 0 ) {
        fprintf(stderr,"unable to open input file %s\n",argv[0]);
        rc = 1 ;
    }

    Exiv2::FileIo   out(argv[1]);
    if ( rc == 0 && out.open("w+b") != 0 ) {
        fprintf(stderr,"unable to open output file %s\n",argv[1]);
        rc = 1;
    }

    if ( rc == 0 ) {
        int  count = 0;
        char method[100];
        strcpy(method,"unknown");
        bool bEnd = pIn->eof() ;

        if ( options.getb ) {
            strcpy(method,"getb");
            while ( !bEnd ) {
                Exiv2::byte b = (Exiv2::byte) pIn->getb();
                bEnd = pIn->eof();
                if ( !bEnd ) {
                    out.putb(b);
                    count++;
                }
            }
        } else {
            strcpy(method,"read");
            Exiv2::byte buffer[128 * 1024];
            while (!bEnd) {
                int n = pIn->read(buffer, sizeof(buffer));
                out.write(buffer, n);
                count += n;
                bEnd = pIn->eof();
            }
        }
        printf("copied %d bytes by %s method\n",count,method);
    }
    pIn->close();

    return rc;
    UNUSED(options);
}

The input options.getb dictates whether the copy is performed byte-by-byte by getb(), or in blocks using read(). Both read the same number of bytes. With the change in PR #1007 the output and input files are now indentical.