cherfia / chromiumly

A lightweight Typescript library that interacts with Gotenberg's different modules to convert a variety of document formats to PDF files.
MIT License
54 stars 7 forks source link

Passing buffer data not support for some formats #366

Closed satramcs closed 3 months ago

satramcs commented 3 months ago

Getting cfb is not supported error while trying to convert .doc (buffer) data to .pdf file format.

Also below formats are not supported doc, pot, pps, ppt, vsd, vsdx, xls

Note: doc and all formats are working fine when provide the direct file path, only issue with buffer data

Sample code

const path = require("path");
const fs = require("fs");

const bufferData = fs.readFileSync(path.join(__dirname, "/resources/1.doc"));
async function main() {
    const { PDFEngine } = await import("chromiumly");
    const buffer = await PDFEngine.convert({
        files: [bufferData],
    });
    fs.writeFileSync(path.join(__dirname, "/resources/1.pdf"), buffer);
}

main().catch(function (e) {
    console.log("Error converting file: " + e);
});

// output: Error converting file: Error: cfb is not supported

When hardcoded and put the doc and other formats directly in below line then it's working fine. src/libre-office/utils/libre-office.utils.ts image

So kindly fix the buffer data issue. Suggestion: May be you can provide the option to get the extension from user when it was a buffer data and get that extension in libre-office.utils.ts

const bufferData = fs.readFileSync(path.join(__dirname, "/resources/1.doc"));
async function main() {
    const { PDFEngine } = await import("chromiumly");
    const buffer = await PDFEngine.convert({
        files: [{data: bufferData, ext: 'doc'}],
    });
    fs.writeFileSync(path.join(__dirname, "/resources/1.pdf"), buffer);
}
cherfia commented 3 months ago

@satramcs, I appreciate your report once more 👏 . I'll investigate further since I rely on a third-party package file-type, to detect binary-based file formats by examining the buffer's magic number, and apparently the package does not support text-based formats. I'll update you once the issue has been resolved.

cherfia commented 3 months ago

@satramcs, by the way, version 3.0.0 includes some breaking changes. Specifically, I've separated LibreOffice from PDFEngines. From now on, instead of using PDFEngine.convert(), you should use LibreOffice.convert(). This update was essential to align with Gotenberg's routes, formerly known as modules.

cherfia commented 3 months ago

@satramcs I just released a new version with a fix for the issue you mentioned. I also went ahead and dropped the file-type dependency and used your suggestion. You can see the code changes here, and as I mentioned previously, you would need to use LibreOffice instead of PDFEngine.

For reference, here's what the code with your suggestion looked like:

const bufferData = fs.readFileSync(path.join(__dirname, "/resources/1.doc"));
async function main() {
    const { LibreOffice } = await import("chromiumly");
    const buffer = await LibreOffice.convert({
        files: [{data: bufferData, ext: 'doc'}],
    });
    fs.writeFileSync(path.join(__dirname, "/resources/1.pdf"), buffer);
}