bpampuch / pdfmake

Client/server side PDF printing in pure JavaScript
http://pdfmake.org
Other
11.61k stars 2.04k forks source link

Support transformation of HTML content into PDF #205

Open jthoenes opened 9 years ago

jthoenes commented 9 years ago

Multiple users have requested to transform HTML content, in one way or another, to transform HTML content into a PDF, using pdfmake.

I is definitely not a simple task and I'm unsure of a good strategy. But I'm open to suggestions.

This is a common place to discuss ideas and solution strategies.

gnibeda commented 9 years ago

Here is small demo: http://jsfiddle.net/mychn9bo/4/ Currently supported:

  1. Tags: u, i, b, div, span, p, table, br;
  2. Styles: font-size, text-align, font-weight, text-decoration, font-style;

Html from demo:

<div>
This is <u>simple</u> html parser demo.<br>
<p style='font-size:20px; text-align:center'>You can set font size and align from style</p>
<table border='1'>
   <tr>
      <td>you</td>
      <td>can</td>
   </tr>
   <tr>
      <td>use</td>
      <td>tables</td>
   </tr>
</table>
<table border='1' widths='30%,60%'>
   <tr>
      <td>or</td>
      <td>set</td>
   </tr>
   <tr>
      <td>table</td>
      <td>width from html</td>
   </tr>
</table>
<br>
<table border='1' widths='20%,50%'>
   <tr>
      <td>nested</td>
      <td>table</td>
   </tr>
   <tr>
      <td>
         <table border='1'>
            <tr>
               <td>1</td>
               <td>2</td>
            </tr>
            <tr>
               <td>3</td>
               <td>4</td>
            </tr>
         </table>
      </td>
      <td></td>
   </tr>
</table>
</div>

Known bugs: text flow after table not working; P.S. This code is not complete, so it have bugs.

tanzid203 commented 9 years ago

gnibeda, could you tell me how this works? How can I give the id or name of the html table, and get the table in my pdf file?

dohomi commented 9 years ago

+1 would love to see this feature

magnusburton commented 8 years ago

This is great and really needed!

DevQueen commented 8 years ago

This is great accept I can't figure out how to get it to work on an ng-click in my Angular controller. How does it know what content to handle? I don't see any way to indicate a div id or table id.

Dejab666 commented 8 years ago

Any News ?

ccjmne commented 8 years ago

@tanzid203 @DevQueen : updated JSFiddle example

Give it the id attribute of the element you wanna parse.

marcusjwhelan commented 8 years ago

What if you are using a web page with many directives, some 10 custom directives, all pointing to their own view and controllers with possible child directives as well.

Is there a way to do this? @ccjmne

tsiegleauq commented 8 years ago

Hi. We made a HTML to PDF Parser for our project OpenSlides which is based on the logic @ccjmne examples provided here. Many thanks for that @ccjmne

It is already pretty complete and supports the most html-tags.

You can find a not-that-complete example in plain old JS in my OpenSlides repo https://github.com/tsiegleauq/OpenSlides/blob/pdfmakeplugin/openslides/motions/static/js/motions/site.js (line 622, for the one part and 920 for the second)

And a far more complete one in "angular-logic" refractored and enhanced by @ThomasJunk in Current OpenSlides Master https://github.com/OpenSlides/OpenSlides/blob/master/openslides/motions/static/js/motions/site.js look for "$scope.makePDF = function(){" https://github.com/OpenSlides/OpenSlides/blob/master/openslides/core/static/js/core/site.js look for ".factory('PdfMakeDocumentProvider'"

(line numbers might change on this a lot)

Maybe this is helping someone or can be bought into pdfmake master.

MarioVanDenEijnde commented 7 years ago

I am trying to include a background style but do not succeed. Any ideas? Thanks.

case "background-color":
{
  switch (st[1]) {
    case "#efefef":
      o.fillColor = "#efefef";
      break;
  }
  break;
}

@tsiegleauq : your JS link is not working.

Kind regards, Mario

MarioVanDenEijnde commented 7 years ago

Hum. It looks like no style in the table is transferred.

MarioVanDenEijnde commented 7 years ago

OK. I found it. I needed to go up to the stack and set the style there.

Cheers. Mario

tsiegleauq commented 7 years ago

@tsiegleauq : your JS link is not working.

Yep. Thats already outdated. We put the pdf generation in another file:

See the "convertHTML" function here: https://github.com/OpenSlides/OpenSlides/blob/master/openslides/core/static/js/core/pdf.js#L214

After my current work in OpenSlides is done I gonna try to merge the htmlConverter into the masterbrach of pdfmake/pdfmake. So this link is (again) likely to change anytime in the future

silveur commented 7 years ago

Would love to see this feature implemented. I've got html content from Summernote that I'd like to display...

tsiegleauq commented 7 years ago

Would love to see this feature implemented.

Me too! :D :D I should find time to do this once my semester is over (arround february)

Deilan commented 7 years ago

@tsiegleauq How is it going? =) We're all looking forward to it. :P

tsiegleauq commented 7 years ago

Yep... So do I. I am currently enhancing the parser for OpenSlides 2.1 release. Unfortunately, merging our parser back to pdfmake is not a priority for OpenSlides at the time being T_T

I need some time to refurbish it and bring it upstream, but you can just grab the parser from OpenSlides and use it in your project.

Furthermore, I need to work on a Master Thesis soon an maybe can not invest more time in this. But hope to get some stuff done.

On Mar 10, 2017 8:41 AM, "Deilan" notifications@github.com wrote:

@tsiegleauq https://github.com/tsiegleauq How is it going? =) We're all looking forward to it. :P

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bpampuch/pdfmake/issues/205#issuecomment-285599927, or mute the thread https://github.com/notifications/unsubscribe-auth/AJwkyCMdTr_B5LKEug2oBKGSA5by4N16ks5rkP6ngaJpZM4Dh_En .

jfoclpf commented 7 years ago

@gnibeda and @ccjmne Can you tell me why applying strictly your code, this simple well written HTML table gives several errors? Kindly check the jsfiddle

<div id="test">
   <table>
         <tr>
            <td colspan="2"><b>Standing costs</b><br><i>Costs that don't depend on the travelled distance and one must pay even if the car is always stopped</i></td>
         </tr>
         <tr>
            <td><b>Costs</b></td>
            <td><b>Monthly amount</b></td>
         </tr>
         <tr>
            <td><b>Depreciation of the vehicle</b><br>Acquisition value: 20000£<br>Value today: 10000£<br>Period of possession: 200 months<br>(20000£-10000£)/200 months</td>
            <td>50.0</td>
         </tr>
         <tr>
            <td><b>Vehicle insurance and Breakdown cover</b><br>200 Pounds per semester</td>
            <td>33.3</td>
         </tr>
         <tr>
            <td><b>Loan interests</b></td>
            <td>0.0</td>
         </tr>
         <tr>
            <td><b>Vehicle inspection (MOT test)</b><br>0 times costing 36 £ each one during 200 months</td>
            <td>0.0</td>
         </tr>
         <tr>
            <td><b>Vehicle Excise Duty (Car tax)</b><br>50 Pounds per year</td>
            <td>4.2</td>
         </tr>
         <tr>
            <td><b>1/2 Maintenance</b><br>300 Pounds per year</td>
            <td>12.5</td>
         </tr>
         <tr>
            <td><b>TOTAL - Standing costs</b></td>
            <td><b>100/month</b></td>
         </tr>
   </table>
</div>

@tsiegleauq Any updates on the HTML parser? 👍 It would be valuable if the images could also be parsed?

jfoclpf commented 7 years ago

@gnibeda and @ccjmne

I found the bug. It seems your code doesn't accept colspan without the superfluous td

This gives error, but according to HTML standards, it is ok

<tr>
  <td></td>
  <td></td>
</tr>
<tr>
  <td colspan="2"></td>
</tr>

I solved the problem adding an extra superfluous td

<tr>
  <td></td>
  <td></td>
</tr>
<tr>
  <td colspan="2"></td>
  <td></td>
</tr>

Though this workaround does not parse correctly through W3 validator, it gives this warning: A table row was 2 columns wide, which is less than the column count established by the first row (3).

Just FYI :)

jfoclpf commented 7 years ago

@gnibeda Could you kindly correct such simple bug? I lost two hours around the code, but it's something recursive and complex and it's been really laborious, and I think you might be able to solve this issue in few minutes. Thank you so much in advance.

@tsiegleauq Your links to openslide are broken! Could you kindly be more specific on which functions shall one use to parse HTML to PDF using pdfmake.js? Thanks

tsiegleauq commented 7 years ago

@jfoclpf OpenSlides changes frequently. HTML to PDFmake works stable and for quite some time for us now. We cannot find time to put it upstream, but you can just grab our code and use it in your project (bringing our parser upstream is in my todo list tho).

So where to find the interesting stuff currently: It should (hopefully) always be located somewhere around: /openslides/core/static/js/core

right now we have a "pdf.js" file where all the stuff, like my parser, is located (there is also pdfworker.js which uses webworkers)

for converting HTML-Strigs into a pdf, we use the function "PdfMakeConverter" (it's an angular factory)

in there you can find recursive subfunction "convertHTML" where most of the stuff happens. The other functions are usually just definition of our paper layout and some functions to ensure the HTML code is clean (in case you wonders why this exists: OpenSlides generates HTML code for motions. Because we support stuff like line numbers, our generated HTML code was sometimes a little "strange")

jfoclpf commented 7 years ago

Thanks a lot Sean But I think I'll skip it. I just wanted a simple-to-use function to parse my HTML :) And it's giving me more workload than I expected. I think I'll use the old fashion way pdfmake.js works, i.e., adding manually the values to the arrays and objects. Besides, it seems you use name functions, which I suppose it is incompatible with IE. In any case, thanks a lot for your great work

vinayak519 commented 6 years ago

@tsiegleauq can you make super simple example with your parser? I been search for the solution to convert Html,css to PDF. I tried almost every plugins mpdf, pdfmake, jspdf, htmltocanvas and pdf, dompdf, kendo etc but none of them are perfect in output. some has css issue, some has quality and etc etc. Im trying to export my web pages as pdf and it gives me lot of layout breaks. I do use framework like bootstrap for css.

sayjeyhi commented 6 years ago

+1 it will be a great update

isAdamBailey commented 6 years ago

I am trying to get htmlToPdfmake working by passing a string of html from a wysiwyg (to be displayed as a note in the pdf), and cannot seem to get it to work at all, I would love to see this feature in pdfmake!

mitchdennett commented 6 years ago

+1

Mihaiii commented 6 years ago

+1

abhiranjankumar00 commented 6 years ago

+1

hansfelix commented 6 years ago

+1

sneu012 commented 6 years ago

+1

phablulo commented 6 years ago

+1

matheussouza9 commented 6 years ago

+1

ccjmne commented 6 years ago

Please, do no just post +1. It pings everybody everytime and doesn't add much value to the conversation:
This thread's purpose is discussing ideas and solution strategies.

However, don't hesitate to add a thumbs up reaction to the comments you agree with, it's a better metric of how much we care about the functionality!

You can also directly subscribe to this issue to be informed of any progress 👍

Aymkdn commented 5 years ago

For the ones interested, I created a gist of the function created by @tsiegleauq and OpenSlides

It's ES5 and import compatible. I had an urgent need of it for a project so I did this quick conversion. It would be awesome to have it integrated to this project. If I have more time in the coming days/weeks I'll create a cleaner module for this function.

tsiegleauq commented 5 years ago

btw, the OpenSlides HTML(motion) to PDF Algorithm has been reworked to be a little cleaner. (just ignore everything that says "line number" and you are good to go) It's now TypeScript and ES6. The old code can still be found here

Html2PdfService

Aymkdn commented 5 years ago

FYI I've just created a module called html-to-pdfmake

liborm85 commented 5 years ago

I will explore how it could be integrated into the project.

tsiegleauq commented 5 years ago

@Aymkdn as far as I see, that basically copies the typescript version from OpenSlides. Therefore, be aware that nested lists will probably not work.

Furthermore, most projects will require custom rules for certain elements, i.e special parsing if your nodes have certain CSS-Classes.

Aymkdn commented 5 years ago

@tsiegleauq I started with the OpenSlides version, but I finally rewrote it completly (expect for the color part). So, no, it's not "a basic copy" of the typescript version.

As you can see in the example the nested lists look fine to me.

most projects will require custom rules for certain elements

It's why it's pretty customizable. I chose this way to do it. Probably not perfect (the purpose is not to cover 100% of the possible cases because it would take months of development), but it should be easy enough for quickly customize the result.

If you have questions, I invite you to go to https://github.com/Aymkdn/html-to-pdfmake/issues (I'm not sure this thread is the right place to discuss about the module I created ^^)

Lexiebkm commented 4 years ago

@Aymkdn Your gist doesn't exist anymore, whereas your html-to-pdfmake package seems to be actively maintained. So, it can be said that the package is a replacement of your initial gist ? I think I will try it.

Aymkdn commented 4 years ago

Yes, the html-to-pdfmake package is the one to use.

Lexiebkm commented 4 years ago

@Aymkdn Thanks for your confirmation.