devanshsingh7727 / pdf-image-extractor

1 stars 1 forks source link

Need help with the extraction of .png images #1

Open LesaintLineon opened 6 months ago

LesaintLineon commented 6 months ago

Hello, I am currently working on a script that download images from pdf, and when i use your library, the script doesn't detect .png images. While trying to fix it, I found out that the issues lie with the frist bytes check. The fisrt byte of these images are 78 (in hexa) so there are detected as octet-stream.

Here is my code, it is very similar to your example.

const { ExtractImages } = require("pdf-image-extractor");
const fs = require('fs');

const [, , originalPdfPath] = process.argv;

const pdfSource = new Blob([fs.readFileSync(originalPdfPath)]);
const fileType = "blob"; // or 'blob' based on your input type

ExtractImages({ pdf: pdfSource, fileType: fileType }).then((images) => {
  var imgName = 0;
  var filename;
  images.forEach(async (image) => {
    var imgType = image.imageType.split("/")[1];
    console.log(image.imageType); // Blob URL for the image
    // You can use the blob URL to display the image or download it
    filename = await 'rendu/downloaded_image'+imgName+"."+imgType;
    imgName+=1;
    fs.writeFileSync(filename, Buffer.from(await image.blob.arrayBuffer()));
    console.log('Image downloaded successfully.');

  });
}).catch((err)=>{
  console.error(err);
});

I'll also attach my test files. testPngJpeg.pdf

devanshsinghvaluecoders commented 6 months ago

hi here is the simple implementation

codeSandBox link


import React, { useState } from "react";
import axios from "axios";
import { ExtractImages } from "pdf-image-extractor";
function App() {
  const [file, setFile] = useState(null);
  const [imgs, setImgs] = useState([]);
  console.log("imgs", imgs);
  const handleSubmit = async (e) => {
    e.preventDefault();
    if (file) {
      // Convert the file to a blob here
      let fileBlob = new Blob([file], { type: file.type });
      ExtractImages({ pdf: fileBlob, fileType: "blob" }).then((images) => {
        setImgs(images);
        // images.forEach((image) => {
        //   console.log(image.url); // Blob URL for the image
        //   // You can use the blob URL to display the image or download it
        // });
      });
    }
  };

  const handleOnChange = (e) => {
    console.log(e.target.files[0]);
    setFile(e.target.files[0]);
  };

  return (
    <form onSubmit={handleSubmit}>
      <h1>React File Upload</h1>
      <input type="file" onChange={handleOnChange} />
      <button type="submit">Extract Images</button>
      {imgs?.map((rep, i) => (
        <img src={rep.url} key={i} />
      ))}
    </form>
  );
}

export default App
```;