DrupalSecurityTeam / drupalpcicompliance

Official github repo for the Drupal PCI compliance white paper.
http://drupalpcicompliance.org
Other
57 stars 15 forks source link

Improve Markdown to PDF Conversion #19

Open rickmanelius opened 10 years ago

rickmanelius commented 10 years ago

Initially I attempted to use pandoc to convert the markdown version to pdf. The result was limited success primarily because I had no control over the styling and documentation was lacking.

I saw this article by Matt Farina and it looks like a promising alternative. http://java.dzone.com/articles/converting-markdown-pdf-php?mz=46483-html5

Having the ability to immediately convert to pdf (versus having to manually typeset and style in another program as I had to do for the version 1.0 release) would greatly speed up the release cycle of this report without sacrificing the design of the paper.

rcross commented 10 years ago

just saw this, and it might be of some assistance (or at least interest) - http://www.gitbook.io/

rcross commented 10 years ago

also, for reference, I'm pretty sure that pandoc utilizes LaTeX as its intermediary and thus you'd need to look at the LaTeX docs formatting.

rcross commented 10 years ago

another possible approach - https://www.npmjs.org/package/markdown-pdf (markdown -> html (allows css styling) -> pdf)

rickmanelius commented 10 years ago

Hi Ryan. The annoying thing about pandoc is that I've used LaTex extensively back in college and was unsuccessful in getting any control over the style/style sheets. It seemed to be very dependent on the particular version of LaTex installed on the host machine and varied considerably between my work and personal laptops. Rather than fight it, I gave up.

The gitbook and Markdown-PDF links look very interesting. I'll try them both and (if successful) use them for the future conversions. Honestly, it's the pdf conversion that is slowing down incremental releases of this document because the current conversion in a PITA :)

rickmanelius commented 10 years ago

I was able to get both tools (gitbook and markdown-pdf) installed. I did an initial test with markdown-pdf and confirmed that the CSS is working, but the links are not. That would be a deal breaker, but there is more testing to do...

Here is the command I was using to generate the pdf.

markdown-pdf DrupalPCICompliance.md -s css/pdf.css -d 2000 -o DrupalPCICompliance-markdown-pdf.pdf

Note that css actually has to be an absolute path and not the relative path I've provided.

rcross commented 10 years ago

How are the links not working? Do you mean PDF internal/bookmark/toc links?

rickmanelius commented 10 years ago

Hi Ryan. Clicking on any of the links (external or internal/bookmark) doesn't result in a browser opening. And gitbook.io requires the structure of the markdown files to change, and it's still buggy. I think pandoc and markdown-pdf are the better options, but it will take a little more debugging at this time.

rickmanelius commented 10 years ago

I'm determined to figure out a solution for this. I may just have to attempt an intermediary like epub.

rcross commented 10 years ago

a little more research:

http://www.markdowntopdf.com/ <- converts links fine it seems, but no obvious way to apply styling. Perhaps contact the author for his approach. @philmmoore

http://www.htmldoc.org/ <- might provide a better html2pdf conversion than the markdown-pdf script.

http://www.docverter.com/ <- another possible approach (pandoc wrapper) http://editorial-app.appspot.com/workflow/6394998534701056/p2vZ5Pj3570 (an example) docverter mentions using https://code.google.com/p/flying-saucer/ for html2pdf, which might mean it doesn't use the native LaTeX-based pdf generation from pandoc.

http://docraptor.com/ (a wrapper for http://www.princexml.com/) is a commercial option

https://github.com/walle/gimli http://blog.kushdilip.com/2014/02/convert-emberjs-online-guide-using.html

an example for formatting with LaTeX for use with pandoc http://stackoverflow.com/questions/17902290/using-css-when-converting-markdown-to-pdf-with-pandoc

Some additional templates for LaTeX/Pandoc: https://github.com/kjhealy/pandoc-templates https://github.com/claes/pandoc-templates

https://github.com/Dashed/pandoc-seed-project (might provide some additional scripting assistance)

After all this, I think there is good merit in converting first to html, then to pdf. While it might be slightly more steps to complete, this should provide the easiest approach for styling without retooling the source doc significantly.

rcross commented 10 years ago

apparently http://www.markdowntopdf.com/ uses http://www.mpdf1.com/ (REF: https://twitter.com/philmmoore/status/457898858669146114) which interestingly also has a drupal module https://drupal.org/project/PDF_using_mPDF and has a composer-compatible version at https://github.com/finwe/mpdf

also found out that the dzone article is from well known Drupal shop http://engineeredweb.com/blog/2014/convert-markdown-pdf-using-php/

nvahalik commented 10 years ago

Have you looked at wkhtmltopdf? I've used it extensively and it is quite good. In the past, I've used a workflow that takes markdown to HTML via pandoc and then outputs the HTML to PDF using wkhtmltopdf.

mgifford commented 9 years ago

You can convert the markdown to restructured text http://docutils.sourceforge.net/rst.html

Then pretty easily convert the RST files to ePub http://pedrokroger.net/using-sphinx-write-technical-books/

We're still playing with this approach, but seems that there are quite a few others.

chendachao commented 8 years ago

There is also an atom plugin called markdown-pdf that allows you to convert markdown to pdf

mgifford commented 8 years ago

That's pretty neat, thanks.