PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.21k stars 2.68k forks source link

Feature request: Support EMF image #1480

Open bakkan opened 5 years ago

bakkan commented 5 years ago

This is:

Expected Behavior

Support EMF image.

Failure Information

Throws PhpOffice\PhpWord\Exception\InvalidImageException exception. Exception message : Invalid image: zip:///Users/xxx/Downloads/xxxx.docx#word/media/image.emf

0 /works/shared/laravel/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage()

1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:///Users/hu...', NULL, false, 'Picture 18')

How to Reproduce

Document file contains emf format images. Google emf I got this page: https://fileinfo.com/extension/emf

<?php
use PhpOffice\PhpWord\IOFactory;
$file = '/path/to/file.docx';
$phpWord = IOFactory::load($file);
$sections = $phpWord->getSections();
foreach ($sections as $section) {
      $elements = $section->getElements();
      foreach ($elements as $element) {
            // do something else...
      }
}

Context

bakkan commented 5 years ago

PHPWord uses getimagesize() function to get image info, getimagesize() doesn't support emf format. 😂😂

mynukeviet commented 5 years ago

I using phpword: dev-master and see error

[Mon, 08 Apr 2019 09:26:50 +0700] [127.0.0.1] [Error(1): Uncaught exception 'PhpOffice\PhpWord\Exception\InvalidImageException' with message 'Invalid image: zip:///opt/lampp/temp/php4WlqvI#word/media/image1.emf' in /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php:418
Stack trace:
#0 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage()
#1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:///opt/lamp...')
#2 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(145): ReflectionClass->newInstanceArgs(Array)
#3 [internal function]: PhpOffice\PhpWord\Element\AbstractContainer->addElement('Image', 'zip:///opt/lamp...')
#4 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(112): call_user_func_array(Array] [FILE: /vendor/phpoffice/phpword/src/PhpWord/Element/Image.php] [LINE: 418]
derKroisi commented 2 years ago

Any news on this issue? Will this be addressed sooner or later?

ThomazPom commented 2 years ago

I encountered this error just now. I guess EMF format is becoming more commonly used in modern docx files

RomMad commented 1 year ago

The same problem for me today. Any news about this issue ?

gurpreetbhatoa commented 1 year ago

There isn't any support for .emf file but there is a workaround

ThomazPom commented 1 year ago

Workaround by code : PHPWord includes template processing for this.

include 'vendor/autoload.php';
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('test2.docx');
$templateProcessor->setValue('name', 'myvar');
$templateProcessor->saveAs('./xx.docx');

https://phpword.readthedocs.io/en/latest/templates-processing.html https://stackoverflow.com/a/53039632/4693790

You can avoid using TemplateProcessing as your need is only to replace .emf references

You may write a prepareDocxReplaceEMF($docxPath) function that do all of these actions on a docx file, before working with phpword renaming docx to zip is not needed .

Use PHP ZipArchive to extract "YOURDOC.docx\word_rels\document.xml.rels" https://www.php.net/manual/en/ziparchive.extractto.php

Replace EMF references in file https://stackoverflow.com/a/69155428/4693790

Use PHP ZipArchive to zip document.xml.rels back https://www.php.net/manual/en/ziparchive.addfile.php

Use PHP ZipArchive to extract emf file https://www.php.net/manual/en/ziparchive.extractto.php

Use ImageMagick to convert the EMF FILE https://imagemagick.org/script/formats.php https://www.php.net/manual/fr/book.imagick.php

Use PHP ZipArchive to zip jpeg file back https://www.php.net/manual/en/ziparchive.addfile.php

user3470 commented 1 year ago

Workaround that worked for me

    private function removeImageReferences($zip, $placeholderImagePath)
    {
        $relsPath = 'word/_rels/document.xml.rels';
        $relsContent = $zip->getFromName($relsPath);

        $relsXml = new SimpleXMLElement($relsContent);
        $imagePaths = [];

        foreach ($relsXml->Relationship as $relationship) {
            if (strpos($relationship['Type'], 'image') !== false) {
                // Store the original image path
                $imagePaths[] = 'word/' . $relationship['Target'];

                // Replace the image target with a placeholder image reference
                $placeholderImageTarget = 'media/placeholder.png';
                $relationship['Target'] = $placeholderImageTarget;
            }
        }

        // Update the relationships file
        $zip->deleteName($relsPath);
        $zip->addFromString($relsPath, $relsXml->asXML());

        // Delete the original image files
        foreach ($imagePaths as $imagePath) {
            $zip->deleteName($imagePath);
        }

        // Add the placeholder image to the zip archive
        $zip->addFile($placeholderImagePath, 'word/' . $placeholderImageTarget);
    }

    private function getPlaceholderImage()
    {
        $placeholderImagePath = 'placeholder.png';

        if (!Storage::disk('local')->exists($placeholderImagePath)) {
            $width = 1;
            $height = 1;
            $color = [255, 255, 255]; // RGB value for white color
            $image = imagecreatetruecolor($width, $height);
            $color = imagecolorallocate($image, $color[0], $color[1], $color[2]);
            imagefilledrectangle($image, 0, 0, $width - 1, $height - 1, $color);
            ob_start();
            imagepng($image);
            $imageData = ob_get_contents();
            ob_end_clean();
            Storage::disk('local')->put($placeholderImagePath, $imageData);
        }

        return storage_path('app/' . $placeholderImagePath);
    }

Then

            $tempFilePath = tempnam(sys_get_temp_dir(), 'doc');
            file_put_contents($tempFilePath, $response->getBody()->getContents());

            $zip = new ZipArchive();
            $placeholderImagePath = $this->getPlaceholderImage();

            $zip->open($tempFilePath);
            $this->removeImageReferences($zip, $placeholderImagePath);
            $zip->close();

             $phpWord = IOFactory::load($tempFilePath);
websuasive commented 1 year ago

In the unlikely event that this is going to be fixed at anytime soon due to what seems to be poor support of EMF images with PHP, is it worth catching this error and replacing the image with a placeholder 'can't be found image/message'?

Then, at least the library can be used for any documents which use an EMF image.

thomasb88 commented 1 year ago

Emf Specifications:

https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-emf/91c257d7-c39d-4a36-9b1f-63e3f73d30ca?redirectedfrom=MSDN

thomasb88 commented 11 months ago

So, PHP getimagesize and getimagesizefromstring accept the following formats https://www.php.net/manual/fr/image.constants.php

It is not including emf file (neither svg...).

So this could be a PHP Feature Request, but in the meantime, we could try to implement it "PHP like" on PHPWord.

In Php code: PHP_FUNCTION(getimagesize) { php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_PATH); } / }}} /

/ {{{ Get the size of an image as 4-element array / PHP_FUNCTION(getimagesizefromstring) { php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_DATA); }

It then get the stream, and call php_getimagesize_from_stream

To know which kind of file it is, it call then php_getimagesize_from_stream

For each kind of defined type, it check a specific number of bytes, and then the corresponding content.

For example, for jpeg, the 3 first bytes should be PHPAPI const char php_sig_jpg[3] = {(char) 0xff, (char) 0xd8, (char) 0xff};

Then it apply a image type specific function to get the related image size. For example, for PSD image type;

"static struct gfxinfo php_handle_psd (php_stream stream) { struct gfxinfo *result = NULL; unsigned char dim[8];

if (php_stream_seek(stream, 11, SEEK_CUR))
    return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
    return NULL;

result = (struct gfxinfo *) ecalloc(1, sizeof(struct gfxinfo));
result->height   =  (((unsigned int)dim[0]) << 24) + (((unsigned int)dim[1]) << 16) + (((unsigned int)dim[2]) << 8) + ((unsigned int)dim[3]);
result->width    =  (((unsigned int)dim[4]) << 24) + (((unsigned int)dim[5]) << 16) + (((unsigned int)dim[6]) << 8) + ((unsigned int)dim[7]);

return result;

}"

Or for BMP file "static struct gfxinfo php_handle_bmp (php_stream stream) { struct gfxinfo *result = NULL; unsigned char dim[16]; int size;

if (php_stream_seek(stream, 11, SEEK_CUR))
    return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
    return NULL;

size   = (((unsigned int)dim[ 3]) << 24) + (((unsigned int)dim[ 2]) << 16) + (((unsigned int)dim[ 1]) << 8) + ((unsigned int) dim[ 0]);
if (size == 12) {
    result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
    result->width    =  (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
    result->height   =  (((unsigned int)dim[ 7]) << 8) + ((unsigned int) dim[ 6]);
    result->bits     =  ((unsigned int)dim[11]);
} else if (size > 12 && (size <= 64 || size == 108 || size == 124)) {
    result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
    result->width    =  (((unsigned int)dim[ 7]) << 24) + (((unsigned int)dim[ 6]) << 16) + (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
    result->height   =  (((unsigned int)dim[11]) << 24) + (((unsigned int)dim[10]) << 16) + (((unsigned int)dim[ 9]) << 8) + ((unsigned int) dim[ 8]);
    result->height   =  abs((int32_t)result->height);
    result->bits     =  (((unsigned int)dim[15]) <<  8) +  ((unsigned int)dim[14]);
} else {
    return NULL;
}

return result;

}"

So, we could implement a glue, that can rely on the file name (.xxx) or on the first byte definition for EMF, and then retrieve the related content from the specification.

More precisely "1.3.1 Metafile Structure An EMF metafile begins with a EMR_HEADER record (section 2.3.4.2), which includes the metafile version, its size, the resolution of the device on which the picture was created, and it ends with an EMR_EOF record (section 2.3.4.1). Between them are records that specify the rendering of the image."

And then "2.3.4.2 EMR_HEADER Record Types The EMR_HEADER record is the starting point of an EMF metafile. It specifies properties of the device on which the image in the metafile was recorded; this information in the header record makes it possible for EMF metafiles to be independent of any specific output device. The following are the EMR_HEADER record types. Name Section Description EmfMetafileHeader 2.3.4.2.1 The original EMF header record. EmfMetafileHeaderExtension1 2.3.4.2.2 The header record defined in the first extension to EMF, which added support for OpenGL records and an optional internal pixel format descriptor.<62> EmfMetafileHeaderExtension2 2.3.4.2.3 The header record defined in the second extension to EMF, which added the capability of measuring display dimensions in micrometers.<63> EMF metafiles SHOULD be created with an EmfMetafileHeaderExtension2 header record. The generic structure of EMR_HEADER records is specified as follows. ... Type (4 bytes): An unsigned integer that identifies this record type as EMR_HEADER. This value is 0x00000001 ... The value of the Size field can be used to distinguish between the different EMR_HEADER record types listed earlier in this section. There are three possible headers:  The EmfMetafileHeader record. The fixed-size part of this header is 88 bytes, and it contains a Header object (section 2.2.9).  The EmfMetafileHeaderExtension1 record. The fixed-size part of this header is 100 bytes, and it contains a Header object and a HeaderExtension1 object (section 2.2.10).  The EmfMetafileHeaderExtension2 record. The fixed-size part of this header is 108 bytes, and it contains a Header object, a HeaderExtension1 object, and a HeaderExtension2 object (section 2.2.11)."

Then in 2.2.9 "Bounds (16 bytes): A RectL object ([MS-WMF] section 2.2.2.19) that specifies the rectangular inclusive-inclusive bounds in logical units of the smallest rectangle that can be drawn around the image stored in the metafile."

Which get us in https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-wmf/4813e7fd-52d0-4f42-965f-228c8b7488d2 section 2.2.2.19 "2.2.2.19 RectL Object The RectL Object defines a rectangle. ... Left (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the upper-left corner of the rectangle. Top (4 bytes): A 32-bit signed integer that defines the y coordinate, in logical coordinates, of the upper-left corner of the rectangle. Right (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the lower-right corner of the rectangle. Bottom (4 bytes): A 32-bit signed integer that defines y coordinate, in logical coordinates, of the lower-right corner of the rectangle. A rectangle defined with a RectL Object is filled up to— but not including—the right column and bottom row of pixels"

thomasb88 commented 11 months ago

Hi Progi1984,

I hadn't the time to install the whole environment to be able to test looking to the project standards, but i wrote a glue for getimagesize that is working on my environment.

As the specification is a little bit painful, i copy below the function, hoping it could help you in managing this ticket.

"/**

thomasb88 commented 11 months ago

But this only solve the CheckImage Problem.

There is also another problem on parseImage on PhpWord/Shared/Html.php on line 960

thomasb88 commented 11 months ago

My Bad, the image type should also be modified

ThomazPom commented 11 months ago

I got around this a year ago, this never bothered me again. I prepare any docx via the method 2 i enumerate here https://github.com/PHPOffice/PHPWord/issues/1480#issuecomment-1278708204

thomasb88 commented 11 months ago

Well, EMF to JPEG is not a lossless conversion.

That's why i updated PHPWord to manage emf image. But you're right that if you don't mind about image quality, your solution is a good workaround.

Progi1984 commented 11 months ago

Someone has a file with EMF/WMF file, please ?

thomasb88 commented 11 months ago

I have one, but it is my customer one, so it can't be used like that.

So i used the trial version of the Metafile Companion Software, and then produce a random image that i inserted on a random docx file. Docx with Emf Image for Test.docx

thomasb88 commented 11 months ago

Hope it helps