Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
7.06k stars 670 forks source link

Math (MathML and/or TeX math) #59

Open SimonSapin opened 11 years ago

SimonSapin commented 11 years ago

Some way to include math equations in WeasyPrint documents would be nice, preferably in vector form.

Possible leads include:

yvess commented 10 years ago

I'm also looking for this feature. Or at least inline svg support, like mentioned here https://github.com/Kozea/WeasyPrint/issues/75

Phantomjs (http://phantomjs.org/) could be used to preprocess the html with javascript (needed for mathjax). you could than get the output with the svg in it. But this would need proper svg inline support.

I also tried the html+css renderer of mathml, but the output doesn't look good. See http://dl.yas.ch/mathml-nojs-css.html compare it to http://dl.yas.ch/math-css.pdf - generated with weasyprint

but this could be a starting point

stroobandt commented 10 years ago

The lack of math support has also been a showstopper for me. Personally, I use MathJax to display math on my web pages. This was suggested and works great with Pandoc, the tool I use to generate XHTML from Markdown. As mentioned above, I could live with SVG math graphics but not with PNG, GIF or the like.

SimonSapin commented 10 years ago

@serge-stroobandt Thank you for informing us of this life-threatening showstopper. I will be looking forward to your contribution.

SimonSapin commented 10 years ago

Apparently Wikipedia wants to run MathJax server-side using PhantomJS. See MathJax’s wiki.

SimonSapin commented 9 years ago

The new kid in the block is https://github.com/Khan/KaTeX. It apparently supports "server-side" rendering in node.js.

cben commented 9 years ago

P.S. If all you need is convert markdown into pretty PDF with math, it begs the question why go through HTML and not LaTeX...

stroobandt commented 9 years ago

@cben

"Why go through HTML and not LaTeX...?"

That is a good question with a non-trivial answer. The short answer is because neither LaTeX nor ConTeXt are intended for unattended typesetting.

Allow me to elaborate on this. Like yourself, I also use Pandoc to generate XHTML web pages from a Markdown source document. This works great! To make the web pages more appealing, I illuminated my document with left floating miniatures with text wrapping around these. This renders fine with XHTML and CSS.

However, both LaTeX and ConTeXt fail miserably at dealing with these miniatures in the vicinity of page breaks. The Q&A site tex.stackechange.com is riddled with my failed attempts and no expert advice would help.

This is why I switched to unattended typesetting in CSS using the proprietary Prince XML software. The input is Pandoc-generated XHTML and the PDF output gets page breaking always right, both for Letter and A4 paper format. An unexpected additional advantage is the sheer speed of rendering with Prince XML in comparison to both LaTeX and ConTeXt.

However, one problem remains; math support. MathJax cannot be used with Prince XML because JavaScript support is incomplete. Up to now, I have been using Prince XML with MathML, but the quality of the output is lousy.

Thanks to your SVG comment, cben, I revisited the problem and came up with a proposal for SvgTex support in Pandoc. This would allow to inject SVG math into the Pandoc-generated HTML, making its HTML completely standalone for math visualisation. It would also solve the Prince XML math issue.

Please, express your support in favour of this proposal over at the Pandoc forum!

Thanks!

stroobandt commented 9 years ago

I wrote a Haksell Pandoc JSON filter for SvgTex myself. The code is up at https://groups.google.com/forum/#!msg/pandoc-discuss/MJggAXUmOII/PjkR8ILr_58J

mb21 commented 6 years ago

Has anyone tested Lasem?

liZe commented 6 years ago

Has anyone tested Lasem?

It's exactly what we need, thanks for the link. So sad it's a C library that's not widely included in Windows GTK+ bundles or in Linux default packages.

SimonSapin commented 6 years ago

Maybe compiling and distributing binary wheels is a viable approach: https://github.com/getsentry/milksnake

pothos commented 6 years ago

Math support via pandoc works with a mathjax → SVG img tag filter: https://github.com/lierdakil/mathjax-pandoc-filter

pandoc --filter ~/node_modules/.bin/mathjax-pandoc-filter -Mmathjax.centerDisplayMath -Mmathjax.noInlineSVG -f markdown+smart -t html5 -o test.html test.md

ousia commented 5 years ago

@stroobandt,

I don’t know what you mean with unattended typesetting. ConTeXt deals with XML natively and it typesets XML sources. In fact, I generate XHTML output with pandoc and I typeset this XHTML files with ConTeXt directlty (no pandoc to ConTeXt conversion).

Sorry for the question, but the issue is relevant to me. It seems almost none knows that ConTeXt may be similar (or even superior) to Prince XML.

mb21 commented 5 years ago

@ousia But ConTeXt doesn't interpret CSS, right?

ousia commented 5 years ago

@mb21, ConTeXt doesn’t interpret CSS, only elements and attributes (pandoc does exactly the same).

Instead of CSS (such as Prince XML might do), an environment is needed. It might be considered similar to a XSL document.

A detailed explanation may be found at http://www.pragma-ade.com/general/manuals/xml-mkiv.

ousia commented 5 years ago

@mb21, I forgot https://www.speedata.de/en/product/ is able to generate PDF files from XML sources and it may be configured with CSS files (at least, partially).

Unfortunately, it doesn’t support MathML speedata/publisher#107.

stroobandt commented 5 years ago

@ousia Below is an answer I once posted on tex.stackexchange.com to a question about the typesetting limitations of LaTeX. However, this limitation equally applies to ConTeXt. There are even more answers to this very same question.

In a nutshell, unattended typesetting for me is hitting F5 on a very straightforward Markdown document and being presented shortly thereafter perfectly laid out PDF documents in both A4 and Letter format.

I tried very hard achieving this with both LaTeX and ConTeXt, but results were slow to produce and not satisfactory for all but the most basic documents. However, PrinceXML allowed me to do so.

For example, have a look at the PDF versions of this Markdown document on my hobby website.


With markup converters like Pandoc it is now possible to generate LaTeX documents without ever touching any LaTeX code.

However, obtaining aesthetic page breaks for slightly complex documents, for example taking into account figures, widows and orphans may still require manual intervention in the LaTeX code.

Quoting Frank Mittelbach:

This issue describes the fundamental problem in TeX’s approach: the program builds optimized paragraph shapes without any knowledge about their final placement on a page. The result is a “galley” from which columns are cut to a specified vertical size. A consequence of this is that one can’t have the shape of a paragraph depend on its final position on the page when using TEX’s page builder algorithm.

In summary, it seems we are not quite yet at that utopian point were one can blindly write content without ever having to worry about how the output will look like in LaTeX. Anyhow, LaTeX is not really intended for unattended typesetting.

For this reason, I now resort to automatic CSS typesetting with PrinceXML for any content that is longer than a letter. The PDF printouts on my web site are generated this way without any user intervention. This was not possible with LaTeX 2ε for the reasons mentionned above, eventhough I tried hard!

If you think of it, HTML+CSS is exactly intended for that: unattended typesetting on screens of unpredictable dimensions. A printed page is merely another media viewport.

On print-css.rocks, one can follow the latest developments in unattended CSS paged media typesetting.

ousia commented 5 years ago

In a nutshell, unattended typesetting for me is hitting F5 on a very straightforward Markdown document and being presented shortly thereafter perfectly laid out PDF documents in both A4 and Letter format.

Many thanks for your detailed reply, @stroobandt.

I press F9 to generate an XHTML document with pandoc, which is automatically typeset with ConTeXt (see https://github.com/ousia/from-pandoc-to-context/tree/master/doc).

The command that the shortcut triggers is similar to:

pandoc -t html -o file.xml file.md && context --environment=file.tex file.xml

From what I see in your documents, I think that “floats are a pain in TeX” might be a more accurate description. I use almost no float myself.

User intervention may not be required, but I’d say that sed shouldn’t be needed when typesetting from HTML+CSS.

I’m afraid there might be something wrong with your document, since numbered lists have an issue (https://hamwaves.com/cl-ocfd/en/cl-ocfd.a4.pdf#page=39).

stroobandt commented 5 years ago

@ousia Hey, thanks for pointing out that issue with the numbered list. I corrected its CSS now.

A couple of months ago, I also made a magazine by processing Pandoc Markdown. I started out with ConTeXt. However, at one point I had to switch over to CSS and PrinceXML, simply because background images and more stuff were quickly getting too complicated to do using ConTeXt.

Using CSS for this is a breeze and document compiling using PrinceXML is way much faster.

magazine preview

From what I see in your documents, I think that “floats are a pain in TeX” might be a more accurate description. I use almost no float myself.

Not only that. There is the speed of production, which I already mentioned. LaTeX & ConTeXt also have issues with widows and orphans in fully automated, unattended production.

Furthermore, with CSS one has way more control over where to allow page breaks. For example: not allowing page breaks right after a subtitle but one or two paragraphs down. This holds true not only for titles but any combination of paragraphs, figures, tables, formulas, etc.

User intervention may not be required, but I’d say that sed shouldn’t be needed when typesetting from HTML+CSS.

True, however I could also have implemented this preprocessing step as a Pandoc filter written in Haskell. That would certainly handle border cases better. I have done so once for inline math. However, so far, I have only made baby steps with Haskell. Getting it done in sed works quicker for me.

Anyhow, the sed preprocessing is only required for achieving a couple of niceties like not ending a line with a period followed by a single very short word; for example the article "A" or "The".

ninest commented 5 years ago

I'm not sure if this is still an issue. But to implement math, you can use this site and take the images. For example, if I want x/y, I will use the following HTML:

<img src="https://latex.codecogs.com/gif.latex?%5Cfrac{x}{y}">
mbarkhau commented 5 years ago

I'm currently working on a Markdown extension that uses the offline rendering of KaTeX. I was hoping this would be a good candidate to use with WeasyPrint as it doesn't require JavaScript.

This is a test page I generated: https://gist.github.com/mbarkhau/ff263164cd162ff1fd734c2b0ce23241

The stylesheet uses some properties which are not supported by WeasyPrint

WARNING: Ignored `text-rendering: auto` at 126:3, unknown property.
WARNING: Ignored `width: min-content` at 153:3, invalid value.
WARNING: Ignored `fill: currentColor` at 902:3, unknown property.
WARNING: Ignored `stroke: currentColor` at 903:3, unknown property.
WARNING: Ignored `fill-rule: nonzero` at 904:3, unknown property.
WARNING: Ignored `fill-opacity: 1` at 905:3, unknown property.
WARNING: Ignored `stroke-width: 1` at 906:3, unknown property.
WARNING: Ignored `stroke-linecap: butt` at 907:3, unknown property.
WARNING: Ignored `stroke-linejoin: miter` at 908:3, unknown property.
WARNING: Ignored `stroke-miterlimit: 4` at 909:3, unknown property.
WARNING: Ignored `stroke-dasharray: none` at 910:3, unknown property.
WARNING: Ignored `stroke-dashoffset: 0` at 911:3, unknown property.
WARNING: Ignored `stroke-opacity: 1` at 912:3, unknown property.
WARNING: Ignored `stroke: none` at 915:3, unknown property.

The file being referred to is https://cdn.jsdelivr.net/npm/katex@0.10.2/dist/katex.css

Despite these warnings, the rendering is still quite good.

This is how it is rendered by WeasyPrint

katex_test_weasyprint

Here in Chrome 74 and Firefox 66

katex_test_chrome

katex_test_firefox

Should I open a separate issue for supporting these properties (assuming that's the reason the rendering is not on par with the browsers)?

mbarkhau commented 5 years ago

I was able to comment out every line that generates a warning and the rendering remains fine in the browser. In other words, there is something else about the rendering that differs from what the browsers do.

liZe commented 5 years ago

I'm currently working on a Markdown extension that uses the offline rendering of KaTeX.

That's a good idea! Thanks for sharing.

Despite these warnings, the rendering is still quite good.

It is, and that's good news as the HTML and CSS structures are quite complicated.

Should I open a separate issue for supporting these properties (assuming that's the reason the rendering is not on par with the browsers)?

You should. It's pretty hard to debug as the HTML structure is crazy, but we could at least find the reasons why it doesn't work.

The main problem in the whole document (except from the ones from your screenshots) is the missing square root symbols. It's caused by #75.

stroobandt commented 3 years ago

This is to inform the community that math2svg is now available as an officially Pandoc adopted Lua filter: https://github.com/pandoc/lua-filters/tree/master/math2svg

This Lua filter for Pandoc converts LaTeX math to MathJax generated scalable vector graphics (SVG) for insertion into the output document in a standalone manner. SVG output is in any of the available MathJax fonts.

This is useful when a CSS paged media engine (such as WeasyPrint) cannot process complex JavaScript as required by MathJax.

No Internet connection is required when generating or viewing SVG formulas, resulting in both absolute privacy and offline, standalone robustness.

Personally, I have been using it for quite some time to generate PDFs with MathJax generated formulas in an unattended typesetting workflow using Prince XML.

Here is a brief sample document: https://hamwaves.com/zc.measuring/en/zc.measuring.letter.pdf

More intricate documents with Markdown source, makefile and CSS are available from the same web site.

liZe commented 3 years ago

This is to inform the community that math2svg is now available as an officially Pandoc adopted Lua filter:

Good to know, thanks a lot for sharing this information!

grewn0uille commented 2 years ago

Hello!

As it’s soon our 2-year anniversary as CourtBouillon, we opened a short survey to know more about your expectations. Don’t hesitate to support this feature and give it a boost 🚀!

The survey will be opened until October 10th.

Update: the survey is now closed. You can find the results here.

grewn0uille commented 10 months ago

Hello!

As you may know, two weeks ago was CourtBouillon 3-year anniversary 🎉.

For this occasion, we prepared a short survey to have your opinion on this year’s features and to know what you’d like to see in the future! Don’t hesitate to give a boost to this feature ✨️

The survey is opened until November 19.