html2pdf failed with too many pages

CSENoni commented 7 years ago

Hi @eKoopmans,

Firstly, I want to thank you for writing this great file. It saves me a lot of time dealing with the quality of content and layout. Recently, I was playing around with this file and vue js and found an issue:

https://jsfiddle.net/rover1234/woa5ucyx/

In two tests in the link, I created a simple for loop which shows 5 texts on the screen, and it is fine to output to a pdf file with 5 pages which show these texts since I apply your page break. However, in the second test, it failed to show the content of more than 20 texts which are all blank pages even though the page break still works fine.

I am not sure about this case. Is it because the picture generated from canvas is too large? and how could I find a way to get away from it (like creates multiple elements to generate different pdf content and combine everything into one file)?

Thanks

CSENoni commented 7 years ago

It seems to me that everytime I increase the number of the height of the margin, I could get more pdf pages with the text. On the other hand, when I decrease the height, there are blank pages with the same amount of contexts I generated. Increasing the height is not a solution for me since if there are more pages generated, it will still fail at some point.

https://jsfiddle.net/rover1234/woa5ucyx/

eKoopmans commented 7 years ago

Hi @CSENoni, sorry I didn't respond sooner. Yes, I see the problem. I've reproduced it here using only html2canvas, which html2pdf relies on.

The problem is in fact deeper than that - it's a problem directly in how canvases work. Canvases have a maximum height/width, and once you exceed that the canvas becomes unusable! That's what's happening here.

I wasn't aware of that problem, so thank you for bringing it up. It's a serious limitation and I don't have any immediate solution. As you mentioned, changing the margins and page size will change how many pages you can get (also changing the resolution/scale), but that's just because it changes the size of the underlying canvas.

MaNuGitH commented 6 years ago

Thanks for your answer. My workaround : changed html2canvas dpi option from 192 to 96

rb0urd0n commented 6 years ago

Hi @eKoopmans

I don't know well html2pdf underlying technologies, but will it be possible to use a second canvas once the limitation is reached on the first one and then merge canvas or PDF?

The cool workaround of @MaNuGitH works fine, but it just to postpone the problem, principally if the content of your PDF is dynamic.

Hope it can help :)

BR

GGross5213 commented 6 years ago

I ran this problem as well and @MaNuGitH's workaround didn't work for me. I wrote a function that might be helpful for someone. Lucky for me the new branch new-api exposes more functions and breaks the process of turning html into a pdf into several steps. I notice that I can output my html as an image created from being drawn onto a canvas which is pretty much the last step before turning it into a PDF. So, I decided to create my own instance of jsPDF and only pass in a page at a time to html2pdf. Then I add the image created by html2pdf to my instance of jsPDF. I hope this is helpful.

import html2pdf from 'html2pdf.js';
import jsPDF from 'jspdf';

const exportHTMLToPDF = async (pages, outputType='blob') => {
  const opt = {
    margin:       [0,0],
    filename:     'myfile.pdf',
    image:        { type: 'jpeg', quality: 0.98 },
    html2canvas:  { dpi: 192, letterRendering: true },
    jsPDF:        { unit: 'in', format: 'letter', orientation: 'portrait' }
  };
  const doc = new jsPDF(opt.jsPDF);
  const pageSize = jsPDF.getPageSize(opt.jsPDF);
  for(let i = 0; i < pages.length; i++){
    const page = pages[i];
    const pageImage = await html2pdf().from(page).set(opt).outputImg();
    if(i != 0) {
      doc.addPage();
    }
    doc.addImage(pageImage.src, 'jpeg', opt.margin[0], opt.margin[1], pageSize.width, pageSize.height);
  }
  // This can be whatever output you want. I prefer blob. 
  const pdf = doc.output(outputType);
  return pdf;
}

You can also write it using recursion if you can't use async-await.

nevergiveup777 commented 6 years ago

Hi @GGross5213 ,

I am still learning JS and I am running into the same problem and I was wondering how can I implement your workaround in my case.

Currently, I have a dynamic content and I am using @eKoopmans plugin to generate a PDF report for it.

So this is my current function:

$("#download").click(function(){
filename = this.value + '.pdf'
var element = document.getElementById('report');

html2pdf(element, {
        margin:       0.25,
        filename:     filename,
        image:        { type: 'jpeg', quality: 0.98 },
        html2canvas:  { dpi: 96, letterRendering: true },
        jsPDF:        { unit: 'in', format: 'letter', orientation: 'portrait' }
      });

});

and I would like to know where should I put my "**element**" variable into your function in order to have it working? I would much appreciate if you could give me hand with it.

GGross5213 commented 6 years ago

@nevergiveup777 Does your element have the html2pdf page-break classes in it?

The way I did it was by breaking my element up into the pages that I wanted, and then passing only one page at a time into html2pdf. Then I convert that page into an image using html2pdf and add it to an instance of jsPDF that I control (what html2pdf does underneath the hood). I would also recommend using the new-api branch of html2pdf has it exposes an outputImg function. I guess you could implement it using the callback branch, but it will be much easier with the new-api. I hope this helps.

waqar-imtiaz commented 6 years ago

@GGross5213 in your example I got error jsPDF.getPageSize is not a function. and pages arg is the html element that we are going to export, right? I have very large tables and images that i needed to convert into pdf but as you know if goes blank.

GGross5213 commented 6 years ago

@waqar-imtiaz The pages arg is a list containing strings of HTML. In my case each element in the list is a single page that is ended with the </div><div class="html2pdf__page-break"></div>. Can you post the code where you are calling jsPDF.getPageSize()? Also what format are your images in? I know I had problems with images showing up in the PDF. If they were coming from an external url such as AWS S3, they wouldn't render. I think it has something to do with CORS. I couldn't figure out how to allow CORS/set up a proxy. So, I just converted them into base64 data URI's and included those.

waqar-imtiaz commented 6 years ago

@GGross5213 My HTML is as follow:

<!-- pdf-export div starts from here -->
<div id="pdf-export" >
            <div class="cover-page">
              <h1>True Monitor Report</h1>
              <h2>{{monitorName}}</h2>
              <ul class="monitor-details">
                <li>PDF Created Date:<span class="pull-right"> {{currentDate}}</span></li>
                <li>Last Run:<span class="pull-right">{{getDateTime(monitorDetails.lastRun)}}</span></li>
                <li>Search Sources:
                  <span class="pull-right">
                    <i *ngIf="monitorDetails.facebookSource">Facebook,</i>
                    <i *ngIf="monitorDetails.twitterSource"> Twitter,</i>
                    <i *ngIf="monitorDetails.googleSource"> Google,</i>
                    <i *ngIf="monitorDetails.youtubeSource"> YouTube,</i>
                    <i *ngIf="monitorDetails.instagramSource"> Instagram,</i>
                    <i *ngIf="monitorDetails.documentSource"> Documents</i>
                  </span>
                </li>
                <li>Search Terms:<span class="pull-right"><i *ngFor="let term of monitorDetails.searchTerms">{{term}},</i></span></li>
                <li>Result Counts:<span class="pull-right">{{totalMonitorResultsCount.counts}}</span></li>
              </ul>
            </div>
            <div class="html2pdf__page-break"></div>
              <div id="images" class="clearfix" style="clear: both; width: 100%">
                <h2>{{monitorName}}</h2>
                <!-- images come here and I add page breaks after six images  -->

              </div>
             <table>
                <!--here comes the table, data for this table comes from the data base and it can be
 thousands of rows, so I can not add page breaks here in the table (or maybe there is someway around). 
so when I generate pdf using html2pdf pdf goes blank even if the data is just about 400 rows only, I tried
 to use fromhtml method of jspdf but I could not make the desired layout of images, if you can help me
 that would be great. thanks. :)  -->
            </table>

          </div>

and js code is: ` var pages = document.getElementById('pdf-export') const exportHTMLToPDF = async (pages, outputType='blob') => { console.log('checking now the pdf'); const opt = { margin: [0,0], filename: 'myfile.pdf', image: { type: 'jpeg', quality: 0.98 }, html2canvas: { dpi: 192, letterRendering: true }, jsPDF: { unit: 'mm', format: 'a4', orientation: 'landscape' } }; const doc = new jsPDF(opt.jsPDF); const pageSize = jsPDF.getPageSize(opt.jsPDF); for(let i = 0; i < pages.length; i++){ const page = pages[i]; const pageImage = await html2pdf().from(page).set(opt).outputImg(); if(i != 0) { doc.addPage(); } doc.addImage(pageImage.src, 'jpeg', opt.margin[0], opt.margin[1], pageSize.width, pageSize.height); } // This can be whatever output you want. I prefer blob. const pdf = doc.output(outputType); return pdf; }

                  exportHTMLToPDF(element, 'outputPdf')`

PS: I think my knowledge of js is very limited, sorry in advance if this is wrong in any way.

audra415 commented 6 years ago

Hi @GGross5213,

How do you create the array of pages that you pass to the exportHTMLToPDF function?

Thanks

audra415 commented 6 years ago

Hi @GGross5213,

I tried using html2pdf().from(element).toPdf().get('pdf').then(pdf => { $scope.pages = pdf.internal.pages; });

but the result is another blank pdf... I tried toContainer, toCanvas, toImg, but none of those produce an array of pages. Do I have to send the html for each page in an array somehow?

GGross5213 commented 6 years ago

@waqar-imtiaz sorry for the delay. Life has been crazy lately. It looks like you don't need to loop through that html element. You should just be able to pass it into the html2pdf function as is. If you do have too many rows in the table that it is causing the Canvas height limit to be reached then maybe you need to periodically split up the table with </div><div class="html2pdf__page-break"></div>. If you add the page breaks then create a list of html strings where each element in the list is a page ending with the html2pdf page break. Also the output type needs to be one of save, blob, arraybuffer, bloburi, bloburl, datauristring, dataurlstring, dataurlnewwindow, datauri, or dataurl. This comes from jsPDF's documentation: https://rawgit.com/MrRio/jsPDF/master/docs/jspdf.js.html#line992

Let me know if you have anymore questions. I hope this helps.

GGross5213 commented 6 years ago

@audra415 sorry for the delay. Life has been crazy. So the the array of pages that I pass into theexportHTMLToPDF funciton is a mixture of html strings that I hard coded which I add data to on run time and html elements that I selected using jquery/javascript selectors.

What are you trying to do with the pdf after you generate it? It looks like you are using angular and I am not too familiar with that. It also looks like you should using outputImg or outputPdf instead of toPdf or toImg.

Let me know if that helps.

audra415 commented 6 years ago

@GGross5213,

Ah ha, array of strings for each page... Got it. I'm building a report that will have different lengths depending on the data used to generate it. So I'll have to break up the pages, use toPdf, then stitch them together in one pdf.

GGross5213 commented 6 years ago

@eKoopmans Sounds great. I will say that I think you want to use outputImg then add the img to your own instance of jsPDF. I think that is the easiest way to stitch the pages together if your total page size exceeds the canvas height limit. (I believe that is how html2pdf.js generates the pdf underneath the hood)

a2zidxdotcom commented 6 years ago

Hi,

Reading through the comments I see it mentions canvas height limit.

If the problem is with the canvas height limit can the canvas height limit be increased?

If this is the case do you know how can this be done.

sohilfynd commented 6 years ago

@GGross5213 I am generating bulk invoice on client side, even if give page-break. More than 15 a4 pages generates blank pages. Also i followed your earlier answer of attaching single single page but after generating like 200 to 250 pages, cpu utilisation is so high that system gets hang. Can you put more light for such type of need.

a2zidxdotcom commented 6 years ago

Hi,

If you have a high demand that is too large for your server you can try

https://phantomjscloud.com

I have been looking at this for some of my pdf conversions. I am still evaluating this for my needs but have managed to use it if it is text only. Their support is currently looking at the issue I have with images. The reason this may solve your problem is that it is fast and takes the load off your server. Also free for up to 500 pages per day.

On Sun, Oct 28, 2018 at 5:25 AM sohilfynd notifications@github.com wrote:

@GGross5213 https://github.com/GGross5213 I am generating bulk invoice on client side, even if give page-break. More than 15 a4 pages generates blank pages. Also i followed your earlier answer of attaching single single page but after generating like 200 to 250 pages, cpu utilisation is so high that system gets hang. Can you put more light for such type of need.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eKoopmans/html2pdf/issues/19#issuecomment-433677645, or mute the thread https://github.com/notifications/unsubscribe-auth/Ao-tIarshognmIwEnwyvXLDgrkyiGkNUks5upT-0gaJpZM4OTsI7 .

--

For How To products visit http://howto.a2zidx.com http://howto.a2zidx.com

Click here or visit http://www.art-seekers.com for Art and Photography http://www.art-seekers.com

Click here or visit http://www.a2zidx.com/?1 for website traffic http://www.a2zidx.com/?1

Click here or visit http://www.myromanticstories.com/ for romantic stories http://www.myromanticstories.com/

Jitesh-Tripathi commented 6 years ago

Hi @eKoopmans , I have very large tables and images that i needed to convert into pdf. lots of table row with different height of column content in it. may be attachment can explain my issue page_break_table_rowcontent_breaking

marbaez commented 5 years ago

Hi @CSENoni, sorry I didn't respond sooner. Yes, I see the problem. I've reproduced it here using only html2canvas, which html2pdf relies on.

The problem is in fact deeper than that - it's a problem directly in how canvases work. Canvases have a maximum height/width, and once you exceed that the canvas becomes unusable! That's what's happening here.

I wasn't aware of that problem, so thank you for bringing it up. It's a serious limitation and I don't have any immediate solution. As you mentioned, changing the margins and page size will change how many pages you can get (also changing the resolution/scale), but that's just because it changes the size of the underlying canvas.

@eKoopmans Could it be possible to generate one canvas per page break and them add all to the PDF?

albertmat commented 5 years ago

@eKoopmans first of all thank you for your library ... we are completely blocked for two days: the "limitation of the canvas" is driving us crazy .... we have tried many solutions but none seems to solve the problem in a reliable way (one is too slow, another produces a "not nice" layout", and so on)

Do you have in your plan to address this problem in the next future? It would be really really really appreciated :)

sohilfynd commented 5 years ago

@albertmat try pdfmake library. Its quite efficient and quick plus the pdf it produces is very small size.

albertmat commented 5 years ago

@sohilfynd really thanks... I will try it...

I didn't read anything yet.. there is some utility, "on the shelve", for converting an existing HTML template to PDF?

charutiwari04 commented 5 years ago

Is this issue fixed? It is generating blank pages when number of pages increase more than 15 or 16.

albertmat commented 5 years ago

@charutiwari04 no news on that :(! Since the problem is not the library itself but "the way" it uses for generating the pdf (see [https://stackoverflow.com/a/11585939/4080966]) i solved generating more than one pdf and, then, merging them server side. For sure it's not the cleaner possible solution... but it works :)

eKoopmans commented 5 years ago

Hi all! First off, huge thanks to @GGross5213 for being all over the comments here, I really appreciate your help.

Second, the central issue here is that HTML canvases are just broken, and have a maximum size that I can't fix. I've investigated creating a "fake" canvas that stitches together multiple canvases, and it might be worth the effort to try to create such a thing, but I haven't had the chance to yet and I don't think it's a realistic goal.

However, there's a recent development that may fix this issue as well as a few other big problems. jsPDF has a built-in canvas "engine", which means we could draw straight into jsPDF! This would also mean vector graphics (highlightable text, small files, all that jazz). It's very promising and it's high on my to-do list for this project.

That said, I have very little time to maintain this project, so I can't make any promises. For now, this approach from @GGross5213 is your best bet (splitting your page into smaller chunks and individually adding them into the PDF). You could cut out a bit of the work with something like this:

// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();

Notice you don't even need any async/await, the code above will build a "then" chain that should bring you right through to the end. Good luck!

Edit: Modified code snippet to remove the "extra" page at the end per @ScottStevenson's comment.

ScottStevenson commented 5 years ago

Thanks for the solution @eKoopmans. Just wanted to point out that this will leave a blank page at the end so I used this:

for(let i=0; i<pages.length; i++) {
      worker = worker.set(opt).from(pages[i]).toContainer().toCanvas().toPdf().get('pdf').then((pdf) => 
      {
        if (i < pages.length - 1) { // Bump cursor ahead to new page until on last page
          pdf.addPage(); 
        }
      });
  }

ScottStevenson commented 5 years ago

The only issue with this is that it can make it take a very long time for the promise to resolve when split up like this, freezing the UI. We added a progress indicator but would like to allow the user to cancel the process so they can get UI control back. Is there any way to cancel the Promise?

eKoopmans commented 5 years ago

Hi @ScottStevenson, the cancelling is a cool idea! There's nothing supported now but I could see incorporating a cancel feature into the .then() implementation. Could you open a separate issue with that suggestion? Thanks.

frandiox commented 5 years ago

@eKoopmans Thanks everyone for the workarounds! It prints all the pages for me but somehow there is a blank page between every meaningful page in my pdf. I was using margin-top: -1px !important; before to avoid this. However, with the new approach of adding pages one by one, it throws Uncaught (in promise) Error: Supplied Data is not a valid base64-String jsPDF.convertStringToImageData. Any ideas? Thanks!

rconstantine commented 5 years ago

@eKoopmans using your example here:

javascript
// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();

where should we be passing the options used similar to @GGross5213's example? I assume we use .set(opt) somewhere. I'm just confused as to where.

Also, can anyone tell me whether the fragments of HTML that constitute the pages need to each be a complete HTML page (open and close HTML, HEAD and BODY tags), or if they can simply be chunks from the original page? I assume you can't split open/close tags between pages, so how would one handle a large table that spans many pages? Or can I just shove what is between each page-break into each page? That would be ideal, I think.

houke commented 5 years ago

@eKoopmans I am using above solutions to generate pdfs with lot of pages. However, most of them will contain links and using that solution, they're all drawn at the first page. (due to page always being 1 in your hyperlink plugin)

Right now, I've 'fixed' this by altering Worker.prototype.toContainer by allowing to pass a pagenumber to the function based on the index of the loop. Worker.prototype.toContainer = function toContainer(page) {}.

var worker = html2pdf()
  .from(elements[0])
  .set(opt)
  .toContainer(pagenumber)
  .toPdf()

Is there a way to do this without altering some files?

rconstantine commented 5 years ago

To answer my own questions, above, for anyone else who might come across this...

// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().set(opt).from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();

As for the pages, they can be any HTML elements. No need to use whole pages.

Also, I ended up using html2pdf.js to generate the pages, rather than figure them out myself because I have data elements that vary in height.

I started with this:

let worker = this.$html2pdf().set(opt).from(element).toContainer()
worker.get('container').then((some) => {

then looped through the children and since mine all had innerHTML, and the DIVs added by html2pdf.js didn't, I'd add my DIVs to a page (using appendChild), then start a new page when I encountered one of html2pdf.js's DIVs. This doesn't exactly make single pages. If you have elements that already break correctly, then one 'page' might actually be more than one page. However, in my case, I don't expect that I'll ever have a case where this will exceed the canvas size.

I then take the total pages generated and add headers/footers.

I'm finding that html2canvas is VERY SLOW!!! All other steps are fast. Some of my reports will be 100+ pages. Big ones could be 300+ pages. It's too slow for that. I'll have to figure out something else.

I may scrap this work and go with something that could generate the PDF on a server. But I do want it to be identical to the PDF on the page, and I don't know whether there are server-side options for this. Anybody know any options?

412799755 commented 5 years ago

Hope the html2pdf.js could detect the excess automatically !

neomeric commented 5 years ago

Have anyone got a solution for empty pages? I am getting randomly empty pages from top then i got some of the content printed out in pdf.

@eKoopmans Please Help !!

khaukheng commented 5 years ago

Hi all! First off, huge thanks to @GGross5213 for being all over the comments here, I really appreciate your help.

Second, the central issue here is that HTML canvases are just broken, and have a maximum size that I can't fix. I've investigated creating a "fake" canvas that stitches together multiple canvases, and it might be worth the effort to try to create such a thing, but I haven't had the chance to yet and I don't think it's a realistic goal.

However, there's a recent development that may fix this issue as well as a few other big problems. jsPDF has a built-in canvas "engine", which means we could draw straight into jsPDF! This would also mean vector graphics (highlightable text, small files, all that jazz). It's very promising and it's high on my to-do list for this project.

That said, I have very little time to maintain this project, so I can't make any promises. For now, this approach from @GGross5213 is your best bet (splitting your page into smaller chunks and individually adding them into the PDF). You could cut out a bit of the work with something like this:
// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();
Notice you don't even need any async/await, the code above will build a "then" chain that should bring you right through to the end. Good luck!

Edit: Modified code snippet to remove the "extra" page at the end per @ScottStevenson's comment.

Hi, Thanks for suggesting this workaround, however, the Page-breaks settings is not working. Consider this

// Page-breaks settings still working here
let worker = html2pdf().set(opt).from(pages[0]).toContainer().get('container')

// Page-breaks settings is not working at all
pages.slice(1).forEach(function (page) {
   worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});

Anyone have any idea can help me out with this? Thanks in advance

moosetunes commented 5 years ago

In case someone needs an alternative workaround, I was able to do the following since my page sizes are fixed. Essentially, in the HTML file I sectioned out each page and gave them name (eg 'page1', 'page2' 'page3', etc.). Then I strung together the HTML2PDF promises and created a series of pages which I stringified and sent via ajax to a PHP script which stitches them back together into a single doc.

Here is some pseudo code:

var element = document.getElementById("page1"); pagearray= new Array(); html2pdf().set(opt).from(element).outputPdf('datauristring').then((result) => { pagearray[0]= result; return page2(pagearray); });

Hope this helps someone.

Cheers, Steven

vulehai commented 5 years ago

Hi everyone! I has same problem. In my case, I can't add page-break for each page, my mean I can't split my div to pages. So, what I should to do for resolve it. I need your help :( ! Pls!

khaukheng commented 5 years ago

Hi all! First off, huge thanks to @GGross5213 for being all over the comments here, I really appreciate your help.

Second, the central issue here is that HTML canvases are just broken, and have a maximum size that I can't fix. I've investigated creating a "fake" canvas that stitches together multiple canvases, and it might be worth the effort to try to create such a thing, but I haven't had the chance to yet and I don't think it's a realistic goal.

However, there's a recent development that may fix this issue as well as a few other big problems. jsPDF has a built-in canvas "engine", which means we could draw straight into jsPDF! This would also mean vector graphics (highlightable text, small files, all that jazz). It's very promising and it's high on my to-do list for this project.

That said, I have very little time to maintain this project, so I can't make any promises. For now, this approach from @GGross5213 is your best bet (splitting your page into smaller chunks and individually adding them into the PDF). You could cut out a bit of the work with something like this:
// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();
Notice you don't even need any async/await, the code above will build a "then" chain that should bring you right through to the end. Good luck!

Edit: Modified code snippet to remove the "extra" page at the end per @ScottStevenson's comment.

Hi, I have a page containing 170 images, every 4 images is contained inside a div with the class image-with-4. I am trying to do 4 images in each page and stitch them together. So in typescript i am doing:

let pages = Array.from(window.document.getElementsByClassName('image-with-4'));
let worker = html2pdf().set(opt).from(pages[0]).toContainer().get('container');
pages.slice(1).forEach(page=>{
    worker = worker.get('pdf').then(pdf=>pdf.addPage())
                    .from(page).toContainer().get('container')
                    .toCanvas().toPdf();
})

When i inspect my network, it download 170 images in every loop, so a total of 170images(170/4)page = 7310 images will be downloaded. given that each images is 2MB, 2MB7310 images = 14620MB (14.27GB) of images will be downloaded. So i need at least 14.27GB of free ram to simply download the page as pdf, not mentioning the time to download too. I have 16gb ram in my laptop, and my chrome snapped when it's 48% done.

I don't know why it keeps download the one that's not selected, is there anything i did wrongly? Please enlighten me! Thanks in advance.

arcturuscom commented 4 years ago

Hi all! First off, huge thanks to @GGross5213 for being all over the comments here, I really appreciate your help.

Second, the central issue here is that HTML canvases are just broken, and have a maximum size that I can't fix. I've investigated creating a "fake" canvas that stitches together multiple canvases, and it might be worth the effort to try to create such a thing, but I haven't had the chance to yet and I don't think it's a realistic goal.

However, there's a recent development that may fix this issue as well as a few other big problems. jsPDF has a built-in canvas "engine", which means we could draw straight into jsPDF! This would also mean vector graphics (highlightable text, small files, all that jazz). It's very promising and it's high on my to-do list for this project.

That said, I have very little time to maintain this project, so I can't make any promises. For now, this approach from @GGross5213 is your best bet (splitting your page into smaller chunks and individually adding them into the PDF). You could cut out a bit of the work with something like this:
// Assuming "pages" is an array of HTML elements or strings that are separate pages:
var worker = html2pdf().from(pages[0]).toPdf();
pages.slice(1).forEach(function (page) {
    worker = worker.get('pdf').then(function (pdf) {
        pdf.addPage();
    }).from(page).toContainer().toCanvas().toPdf();
});
worker = worker.save();
Notice you don't even need any async/await, the code above will build a "then" chain that should bring you right through to the end. Good luck!

Edit: Modified code snippet to remove the "extra" page at the end per @ScottStevenson's comment.

Hello can anyone please describe how can split the html into a pageable chunks

oskarleonard commented 4 years ago

Created a PR https://github.com/eKoopmans/html2pdf.js/pull/314

In the meantime i use this

  const saveMultiPagePdf = (opt) => {
    // To avoid blank pages in the pdf we need to manually add the divs
    // See issue https://github.com/eKoopmans/html2pdf.js/issues/19
    const html2pdf = require('html2pdf.js');
    var linkInfo = [];
    var orig = {
      toContainer: html2pdf.Worker.prototype.toContainer,
      toPdf: html2pdf.Worker.prototype.toPdf,
    };

    html2pdf.Worker.prototype.toContainer = function toContainer(pageNumber) {
      console.log('prototype html2pdf.Worker.: ', pageNumber);
      return orig.toContainer.call(this).then(function toContainer_hyperlink() {
        // Retrieve hyperlink info if the option is enabled.
        if (this.opt.enableLinks) {
          // Find all anchor tags and get the container's bounds for reference.
          var container = this.prop.container;
          var links = container.querySelectorAll('a');
          var containerRect = unitConvert(
            container.getBoundingClientRect(),
            this.prop.pageSize.k
          );
          linkInfo = [];

          // Loop through each anchor tag.
          Array.prototype.forEach.call(
            links,
            function(link) {
              // Treat each client rect as a separate link (for text-wrapping).
              var clientRects = link.getClientRects();
              for (var i = 0; i < clientRects.length; i++) {
                var clientRect = unitConvert(
                  clientRects[i],
                  this.prop.pageSize.k
                );
                clientRect.left -= containerRect.left;
                clientRect.top -= containerRect.top;

                var page =
                  pageNumber ||
                  Math.floor(clientRect.top / this.prop.pageSize.inner.height) +
                    1;
                console.log('THE page', page);
                var top =
                  this.opt.margin[0] +
                  (clientRect.top % this.prop.pageSize.inner.height);
                var left = this.opt.margin[1] + clientRect.left;

                linkInfo.push({ page, top, left, clientRect, link });
              }
            },
            this
          );
        }
      });
    };

    html2pdf.Worker.prototype.toPdf = function toPdf() {
      return orig.toPdf.call(this).then(function toPdf_hyperlink() {
        // Add hyperlinks if the option is enabled.
        if (this.opt.enableLinks) {
          // Attach each anchor tag based on info from toContainer().
          linkInfo.forEach(function(l) {
            this.prop.pdf.setPage(l.page);
            this.prop.pdf.link(
              l.left,
              l.top,
              l.clientRect.width,
              l.clientRect.height,
              { url: l.link.href }
            );
          }, this);

          // Reset the active page of the PDF to the final page.
          var nPages = this.prop.pdf.internal.getNumberOfPages();
          this.prop.pdf.setPage(nPages);
        }
      });
    };

    const domElementCollection = pageClassNameId
      ? document.getElementsByClassName(pageClassNameId)
      : pdfContent.current.children;

    const domPages = [...domElementCollection].map((htmlElement) => {
      return htmlElement.outerHTML;
    });

    let worker = html2pdf().set(opt);

    domPages.forEach((page, index) => {
      worker = worker
        .from(page)
        .toContainer(index + 1)
        .toCanvas()
        .toPdf()
        .get('pdf')
        .then((pdf) => {
          if (index < domPages.length - 1) {
            // dont add last blank page
            pdf.addPage();
          }
        });
    });

    return worker.save();
  };

Valt-Fernando commented 4 years ago

I'm using html2pdf.js to generate PDF from html obtained from within a WYSIWYG editor, I've been getting an error when generating PDFs when the generated PDF had only one page with content, a second page was generated in white. I found out how to remove the blank page during the transformation process as follows.

const opt = { margin: [0, 0], filename: myPdf + ".pdf", html2canvas: { scale: 4, dpi: 192, letterRendering: true, ignoreElements: e => { return e.classList.contains("cke_pagebreak") || e.classList.contains("html2pdf__page-break") ? true : false; } }, jsPDF: { unit: "pt", format: "A4", orientation: "portrait", putOnlyUsedFonts: true, pagesplit: true, }, pagebreak: {mode: ["avoid-all"], after: ".cke_pagebreak"} };

const worker = html2pdf().set(opt).from(document.body).toPdf().get("pdf").then(pdf => {
    const e = pdf.internal.collections.addImage_images;
    for (let i in e) {
        e[i].h <= 133 ? pdf.deletePage(+i + 1) : null;
    }
});

/* download the PDF */
worker.save();

/* save to blob format */
worker.output('blob');

akshayamathi commented 3 years ago

Hello Everyone,

I have also used this package to generate a pdf, getting all blank pages if the number of pages goes beyond the limit, can anyone help me to fix the issue?

mohd-e-mustafa commented 3 years ago

I ran this problem as well and @MaNuGitH's workaround didn't work for me. I wrote a function that might be helpful for someone. Lucky for me the new branch new-api exposes more functions and breaks the process of turning html into a pdf into several steps. I notice that I can output my html as an image created from being drawn onto a canvas which is pretty much the last step before turning it into a PDF. So, I decided to create my own instance of jsPDF and only pass in a page at a time to html2pdf. Then I add the image created by html2pdf to my instance of jsPDF. I hope this is helpful.
import html2pdf from 'html2pdf.js';
import jsPDF from 'jspdf';

const exportHTMLToPDF = async (pages, outputType='blob') => {
  const opt = {
    margin:       [0,0],
    filename:     'myfile.pdf',
    image:        { type: 'jpeg', quality: 0.98 },
    html2canvas:  { dpi: 192, letterRendering: true },
    jsPDF:        { unit: 'in', format: 'letter', orientation: 'portrait' }
  };
  const doc = new jsPDF(opt.jsPDF);
  const pageSize = jsPDF.getPageSize(opt.jsPDF);
  for(let i = 0; i < pages.length; i++){
    const page = pages[i];
    const pageImage = await html2pdf().from(page).set(opt).outputImg();
    if(i != 0) {
      doc.addPage();
    }
    doc.addImage(pageImage.src, 'jpeg', opt.margin[0], opt.margin[1], pageSize.width, pageSize.height);
  }
  // This can be whatever output you want. I prefer blob. 
  const pdf = doc.output(outputType);
  return pdf;
}
You can also write it using recursion if you can't use async-await.

Hi @GGross5213,

I've encountered the same issue in a project. I've got an issue as I've innerHTML so How can I convert that into pages?

Here is my code:

const reportComponent = this.$refs.exportPdf.innerHTML;
      const pdfOptions = {
        margin: 1,
        image: { type: "jpeg", quality: 2 },
        html2canvas: {
          scale: 2,
        },
        jsPDF: { unit: "in", format: "letter", orientation: "portrait" },
        filename: file_name + ".pdf",
        pagebreak: {
          before: ".beforeClass",
          after: ["#after1", "#after2"],
          avoid: "img",
        },
      };
      html2pdf()
        .from(reportComponent)
        .set(pdfOptions)
        .toPdf()
        .get("pdf")
        .then(function(pdf) {
          const allPages = pdf.internal.getNumberOfPages();
          const pdfPages = pdf.internal.getNumberOfPages() - 1;

          const pageUrl = window.location.href;

          const d = new Date();
          const pdfDate =
            d.getMonth() + "/" + d.getDate() + "/" + d.getFullYear();

          for (let i = 1; i <= pdfPages; i++) {
            pdf.setPage(i);
            pdf.setFontSize(10);
            pdf.setTextColor("#000");
            pdf.deletePage(allPages)
            pdf.text(
              i + "/" + pdfPages,
              pdf.internal.pageSize.getWidth() - 0.9,
              pdf.internal.pageSize.getHeight() - 0.3
            );
            pdf.text(
              pageUrl,
              pdf.internal.pageSize.getWidth() - 8.2,
              pdf.internal.pageSize.getHeight() - 0.3
            );
            pdf.text(
              file_name,
              pdf.internal.pageSize.getWidth() / 2.3,
              pdf.internal.pageSize.getHeight() - 10.3
            );
            pdf.text(
              pdfDate,
              pdf.internal.pageSize.getWidth() - 7.3,
              pdf.internal.pageSize.getHeight() - 10.3
            );
          }
        })
        .save();

helianthuswhite commented 3 years ago

Can it use multi canvases to draw images divided by the page break？ So every part between two page breaks will not oversize.

CrazyOvi commented 3 years ago

I ran this problem as well and @MaNuGitH's workaround didn't work for me. I wrote a function that might be helpful for someone. Lucky for me the new branch new-api exposes more functions and breaks the process of turning html into a pdf into several steps. I notice that I can output my html as an image created from being drawn onto a canvas which is pretty much the last step before turning it into a PDF. So, I decided to create my own instance of jsPDF and only pass in a page at a time to html2pdf. Then I add the image created by html2pdf to my instance of jsPDF. I hope this is helpful.
import html2pdf from 'html2pdf.js';
import jsPDF from 'jspdf';

const exportHTMLToPDF = async (pages, outputType='blob') => {
  const opt = {
    margin:       [0,0],
    filename:     'myfile.pdf',
    image:        { type: 'jpeg', quality: 0.98 },
    html2canvas:  { dpi: 192, letterRendering: true },
    jsPDF:        { unit: 'in', format: 'letter', orientation: 'portrait' }
  };
  const doc = new jsPDF(opt.jsPDF);
  const pageSize = jsPDF.getPageSize(opt.jsPDF);
  for(let i = 0; i < pages.length; i++){
    const page = pages[i];
    const pageImage = await html2pdf().from(page).set(opt).outputImg();
    if(i != 0) {
      doc.addPage();
    }
    doc.addImage(pageImage.src, 'jpeg', opt.margin[0], opt.margin[1], pageSize.width, pageSize.height);
  }
  // This can be whatever output you want. I prefer blob. 
  const pdf = doc.output(outputType);
  return pdf;
}
You can also write it using recursion if you can't use async-await.

Could you help me understand how you take "pages" from HTML? I just try to understand how to take my page (HTML) and transfer it as an array as it is required in your solution. Many thanks!

wcomicho commented 2 years ago

Hello guys,

Facing the same issue. Its okay when using Windows and Android browsers. Does not work using Safari for iPhone users. I will try some of the code here. Hope it works.

BalajiArun004 commented 2 years ago

@eKoopmans I am using above solutions to generate pdfs with lot of pages. However, most of them will contain links and using that solution, they're all drawn at the first page. (due to page always being 1 in your hyperlink plugin)

Right now, I've 'fixed' this by altering Worker.prototype.toContainer by allowing to pass a pagenumber to the function based on the index of the loop. Worker.prototype.toContainer = function toContainer(page) {}.
var worker = html2pdf()
  .from(elements[0])
  .set(opt)
  .toContainer(pagenumber)
  .toPdf()
Is there a way to do this without altering some files?

I'm also facing the same issue but didn't get your workaround can you help this out

Naushadalam22 commented 2 years ago

Hi @eKoopmans

I have use html2pdf: "html2pdf.js": "^0.10.1".

I have created html to pdf with 40-50 pages. In generated pdf pages with 1 to 10 its working fine but when pages more than 20 pages it failed to show the full page content and color disappear issue is coming.

Here is my code:

htmlToPdfOptions: { margin: [1,0.5], html2canvas: { scale:2, dpi: 300, removeContainer: true, imageTimeout: 0, letterRendering: true, useCORS: true, }, jsPDF: { unit: "in", format: 'letter', orientation: "portrait", compress:true, pagesplit: true, }, image: { type: "jpg", quality: 0.98 }, pagebreak: { avoid: ['.divsec' , 'img'] }, enableLinks: false }

        When we decrease scale equal to "0.5" then pdf color issue solve but text blurry issue coming and when I increase scale 
         equal to"1 or 2" then text blurry issue solve but color disappear issue coming.

        Could you please help!

        Thank you!

eKoopmans / html2pdf.js

html2pdf failed with too many pages #19