Gottox / node-pdfutils

tool for analyzing and converting PDF
104 stars 23 forks source link

Resolution when splitting pages #16

Open wmbutler opened 10 years ago

wmbutler commented 10 years ago

I'm using pdfutils (thus poppler) to split very large documents into individual pages and storing the pages individually in the database. My end users noticed that the resolution of the individual pages is quite a bit less than the original document. Is there a default image resolution being called by poppler or pdfutils during the split process? Is it possible to alter this value? Any help you can offer would be very helpful.

Gottox commented 10 years ago

Can you provide example code which triggers this bug?

wmbutler commented 10 years ago

So, I upload a multipage PDF via angular to node. Node takes it and splits it (the last function). I then process the stream and push each page to a database. We've compared the source document to the single pages and the images have lost resolution in the process. The code is pretty generic. I'm thinking that this may have something to do with poppler defaults. Thanks for looking at it.

  // pdfSplit
  app.post('/api/pdfSplit', function(req, res) {
    var chunks = [];
    var data = JSON.parse(req.body.foo);
    var pdfutils = require('pdfutils').pdfutils;

    // This combines the chunks of data
    var assemble = function(chunk, meta) {
        chunks.push(chunk);
        //console.log('chunk:', chunk.length);
    };

    // This writes the pages to the database
    var write = function(meta) {
      var result = Buffer.concat(chunks);
      chunks = [];
      //console.log('final result:', result.length);
      data.document.page = parseInt(meta.label);

      // Define _attachments attribute
      data._attachments = {};
      data._attachments[data.filename] = { 'content_type' : 'application\/pdf' , 'data' : result.toString('base64') };
      //console.log(data);

      q.post({
        data : data,
        callback: function (resp, success) {
          if (success) {
            res.send(resp);
          }
          else {
            res.send(resp._HTTP.status);
          }
        }
      });
    };

    // This splits the file up into separate pages
    pdfutils(req.files.file.path, function(err, doc) {
      for ( var i=0 ; i<doc.length; i++) {
        var readable = doc[i].asPDF();
        readable.on('data', assemble);
        readable.on('end', write);
      }
    });

  });
wmbutler commented 10 years ago

I have before and after pdf files. Unfortunately, github will not allow me to upload pdfs to this comment. If you like I can send to your email. Let me know.