dolanmiu / docx

Easily generate and modify .docx files with JS/TS with a nice declarative API. Works for Node and on the Browser.
https://docx.js.org/
MIT License
4.44k stars 486 forks source link

Repeated images are written several times #2680

Closed mustache1up closed 1 month ago

mustache1up commented 6 months ago

When using the same image in the document, each instance of the image results in a different file inside the generated docx file.

Documents with reused images gets linearly bigger.

Minimum example:

import { Document, ImageRun, Packer, Paragraph } from "docx";

const imageBase64Data = "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8z8BQz0AEYBxVSF+FABJADveWkH6oAAAAAElFTkSuQmCC";

const doc = new Document({
    sections: [
        {
            children: [
                new Paragraph({
                    children: [
                        new ImageRun({
                            data: imageBase64Data, // image
                            transformation: {
                                width: 100,
                                height: 100,
                            },
                        }),
                    ],
                }),
                new Paragraph({
                    children: [
                        new ImageRun({
                            data: imageBase64Data, // same image
                            transformation: {
                                width: 200,
                                height: 200,
                            },
                        }),
                    ],
                }),
            ],
        },
    ],
});

Packer.toBuffer(doc).then((buffer) => {
    fs.writeFileSync("same_image_twice.docx", buffer); // two identical media files inside the docx
});
mustache1up commented 6 months ago

We could use the git uses for a while now, using a digest of the file as uniqueId of the image.

Git uses SHA1, I'll test if it work well in the ImageRun class.

mustache1up commented 6 months ago

Seems to work great. Adding tests to spec files in order to open a PR.