cabo / kramdown-rfc

An XML2RFC (RFC799x) backend for Thomas Leitner's kramdown markdown parser
MIT License
193 stars 82 forks source link

Equation support #84

Open larseggert opened 3 years ago

larseggert commented 3 years ago

I wonder if it would be possible to add some sort of support for rendering equations in ASCII and SVG, with asciitex and tex2svg, respectively. They take (more or less) the same input format, and produce reasonable-looking outputs.

For example, this input

K = \sqrt[3]{W_{max} * \dfrac{1 - \beta_{cubic}}{C}}

renders to the following ASCII

           ________________________
          /         1  -  beta
      3  /                    cubic
K  =  | /  W     *  ---------------
      |/    max            C

and this SVG

Screen Shot 2020-11-16 at 12 23 53
cabo commented 3 years ago

Kramdown currently uses tex2mail for the ASCII part. (e.g., see https://datatracker.ietf.org/doc/draft-ietf-core-senml-versions/ .) I’ll check the tools you mentioned, right after the IETF!

Grüße, Carsten

On 2020-11-16, at 11:24, Lars Eggert notifications@github.com wrote:

I wonder if it would be possible to add some sort of support for rendering equations in ASCII and SVG, with asciitex and tex2svg, respectively. They take (more or less) the same input format, and produce reasonable-looking outputs.

For example, this input

K = \sqrt[3]{W{max} * \dfrac{1 - \beta{cubic}}{C}}

renders to the following ASCII

       ________________________
      /         1  -  beta
  3  /                    cubic

K = | / W * --------------- |/ max C

and this SVG

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

larseggert commented 3 years ago

tex2mail produces uglier output:

                 1 - \beta
     +-+                  cubic
K = \|[ 3]W    * --------------
           max         C
cabo commented 3 years ago

asciitex is rather limited (it can't do tilde, cdot, ...). So in many cases you'd essentially have to provide two different inputs for asciitex and tex2svg. tex2svg's output also is super-ugly in the context of an RFC (only slightly less so with STIX fonts). Hmm. I've put up a prototype, subject to change, as 1.3.17, gem update as usual, and use as in

~~~ math
K = \sqrt[3]{W_{max} * \dfrac{1 - \beta_{cubic}}{C}}
~~~

(this will migrate into markdown equations, but putting this into the codeblocks was easier to prototype right now). Feedback welcome.

larseggert commented 3 years ago

Thanks, will test! (Will be in a week, vacation coming up.)

If asciitex is too limited, is it possible to separately specify different figure variants in the markdown, e.g., one that gets rendered via tex2svg for rendering in HTML and a fallback one that has handcrafted ASCII-art for rendering in text?

cabo commented 3 years ago

This is pretty much a specific version of the generalized question "what about <artset>". Brainstorming on how this could look like and still be markdown would be appreciated.

larseggert commented 3 years ago

Well, you could do something like

~~~ math:svg
~~~

~~~ math:txt
~~~~

and basically only render the one that matches the desired output. But it's of course ugly.

cabo commented 3 years ago

kramdown-rfc doesn't know what the desired output is. It only can create an <artset>. Doing an artset with a type=svg and another type does what you want in this specific case, but this is hard to generalize. But maybe we don't need to generalize.

The SMOP (small matter of programming) is recognizing adjacent figures and munching them into an artset.

(Downside: I really would like to use

$$ version ~=~ \sum_{fc=0}^{52} ~~ present(fc) x 2^{fc} $$

style equations, but then we probably will need to support TeX, MathML, and AMS forms of math input anyway...)

larseggert commented 3 years ago

I'd need cases from amsmath. tex2svg supports it, but not asciitex.

larseggert commented 3 years ago

The SMOP (small matter of programming) is recognizing adjacent figures and munching them into an artset.

Could you require them to have the same title, anchor or better yet some un-rendered attribute?

cabo commented 3 years ago

Yes, but the distance from just doing a <artset markdown="1"> is shrinking.

larseggert commented 3 years ago

I think this works well enough!

FYI, I have forked asciitex at https://github.com/larseggert/asciiTeX and am trying to make it support more of the same syntax as tex2svg. The code it crap though, so it's not easy to modify it.

larseggert commented 3 years ago

@cabo is there an inherent timeout somewhere when kramdown-rfc2629 processes its input?

I added the required software to the CI run for i-d-template, and am trying to make that build the editors' copies of the draft. This works fine locally and takes a few seconds.

It fails when running via GitHUb actions, see for example https://github.com/NTAP/rfc8312bis/runs/1511478161?check_suite_focus=true

The first thing to note is that for some reason, GitHub actions is much slower - the run fails after ~17 minutes. Processing of each equation takes 1-2 minutes vs. 1-2 seconds locally. That's not great, but the larger issue is that kramdown-rfc2629 at some points seems to give up on processing the equations, leading to a corrupt XML that then fails to process further.

Any ideas?

cabo commented 3 years ago

@cabo is there an inherent timeout somewhere when kramdown-rfc2629 processes its input?

Only for bibxml fetches.

I added the required software to the CI run for i-d-template, and am trying to make that build the editors' copies of the draft. This works fine locally and takes a few seconds.

It fails when running via GitHUb actions, see for example https://github.com/NTAP/rfc8312bis/runs/1511478161?check_suite_focus=true

The first thing to note is that for some reason, GitHub actions is much slower - the run fails after ~17 minutes. Processing of each equation takes 1-2 minutes vs. 1-2 seconds locally. That's not great, but the larger issue is that kramdown-rfc2629 at some points seems to give up on processing the equations, leading to a corrupt XML that then fails to process further.

I am looking at line 62:

ERROR: Unable to parse the XML document: /github/workspace/stdin stdin: Line 1: Document is empty stdin: Line 1: Start tag expected, '<' not found

that looks like an svgcheck failure:

$ svgcheck -qa </dev/null
ERROR: Unable to parse the XML document: /private/tmp/stdin
 stdin: Line 1: Document is empty

Any ideas?

I have no idea why this takes so long, and I have no idea whether that fact is related to the svgcheck failures. Note that svgcheck mostly works (when it outputs

stdin:1: Style property 'vertical-align' removed
ERROR: File does not conform to SVG requirements

it actually works; no idea why this is called "ERROR" when the job of svgcheck was to fix the SVG requirements.

larseggert commented 3 years ago

I'm patching the toolchain to add more diagnostics.

Something else: you should pass --speech=false to tex2svg to avoid this issue.

cabo commented 3 years ago

It may be worth pointing out that kramdown-rfc memoizes svg tool invocations, so whether this runs through in milliseconds locally may not be a useful comparison with processing a fresh docker instance.

cabo commented 3 years ago

Oh.

I already get a ton of

draft-eggert-tcpm-rfc8312bis.xml(1960): Warning: Duplicate attribute id="E1-STIXWEBNORMALI-1D462" found after including svg from inline:b'<svg xmlns:xlink="http://www.w3' ....  This can cause problems with some browsers.
draft-eggert-tcpm-rfc8312bis.xml(1960): Warning: Duplicate attribute id="E1-STIXWEBNORMALI-1D445" found after including svg from inline:b'<svg xmlns:xlink="http://www.w3' ....  This can cause problems with some browsers.

Because the font included by tex2svg (here: STIX) has id values that repeat.

OBTW, could that font be a problem? (Is it in the docker image?)

larseggert commented 3 years ago

I install npm install -g mathjax-node-cli. Not sure if that is all that is needed?

larseggert commented 3 years ago

Removing --font STIX fixes things, but it's still very slow. Let me see if I can install those fonts somehow.

cabo commented 3 years ago

Hmm, these fonts seem to be installed with mathjax:

Asana-Math Gyre-Pagella Gyre-Termes Latin-Modern Neo-Euler STIX-Web TeX

(The argument --font STIX is translated to STIX-Web.)

(On my system it's under lib/node_modules/mathjax-node-cli/node_modules/mathjax/fonts/HTML-CSS -- you might want to check the docker.)

cabo commented 3 years ago

Can you save the XML?

larseggert commented 3 years ago

Those fonts are installed in my docker image as well.

Regarding the XML, not sure how to make @martinthomson's Makefile retain and save that?

larseggert commented 3 years ago

Do you cache the output of asciitex? When running under CI with i-d-template, not all equations get rendered, and I can't figure out why that is. Suspecting corrupt cache?

cabo commented 3 years ago

Everything in svg_tool_process is memoized. There should be nothing in that cache on a CI system, no? But on your local laptop, you'll need to remove files with names like kdrfc-1_3_10-svg_tool_process-1d98e03299fd6aae617a95e6b05284d134d5316a.cache from your .refcache (or ~/.cache/xml2rfc, if you have it set up that way). As you can see, the kramdown-rfc version is part of the file name, so a kramdown-rfc update starts with a fresh cache, but if you change asciitex, kramdown-rfc does not know that.

cabo commented 3 years ago

In principle, the XML should be preserved by making it .SECONDARY and checking in a version to gh-pages. I'm not a registered pilot in that Makefile jungle, though...

larseggert commented 3 years ago

There shouldn't be anything in the cache, no. And I also tried explicitly clearing it, which didn't help.

But the text version at https://ntap.github.io/rfc8312bis/pretty-math/draft-eggert-tcpm-rfc8312bis.txt contains quite a few blocks like

   (Artwork only available as ascii-art: No external link available, see
   draft-eggert-tcpm-rfc8312bis-latest.html for artwork.)

instead of the ASCII math from asciitex, so something is clearly not working in the CI runner. (When I run locally, all equations get created as ASCII art.)

(And shouldn't it say "Artwork not available as ascii-art"?)

martinthomson commented 3 years ago

It looks like you got it working.

(BTW, I generally mark .xml as .INTERMEDIATE explicitly. They aren't that useful and I generally don't find having them lying around to be useful. But that's not a firmly held position. It's also a moderately easy patch if you want to propose a switch, though I guess there are a few tangles I'd have to clear out with respect to the pre-commit hook.)

cabo commented 3 years ago

Apart from the debugging support that would help here, the XML is very useful to have around if you have further processes running from that, e.g.,

xpath rfc8152.xml "//artwork[@type='CDDL']/text()" >cose-8152.cddl
cabo commented 3 years ago

BTW, just pushed 1.3.18 with v3 dl/dt/dd; this should fix some of the artifacts in 8312bis

cabo commented 3 years ago

... and of course runs into the contact bug.

You'll need a workaround for

{{{β}{}}}<sub>cubic</sub>:
: CUBIC multiplication decrease factor as described in {{mult-dec}}

until that is fixed, e.g.

{{{β}{}}}<sub>cubic</sub>:
 CUBIC multiplication decrease factor as described in {{mult-dec}}
cabo commented 3 years ago

Added some urgency to the contact bug: https://trac.tools.ietf.org/tools/xml2rfc/trac/ticket/519#comment:10

larseggert commented 3 years ago

You can get the XML now from the artifacts of the CI runs, e.g., https://github.com/NTAP/rfc8312bis/actions/runs/410326116

larseggert commented 3 years ago

By the way, it looks like the missing ascii-art equations are all the ones (and only the ones) that use the α and β unicode characters. I don't quite understand why, given that they render fine locally under Ubuntu and Darwin.

larseggert commented 3 years ago

OK, figure it out and https://ntap.github.io/rfc8312bis/pretty-math/draft-eggert-tcpm-rfc8312bis.txt now has the fancy ASCII-art math for all equations.

The problem was that whatever image i-d-template is using for doing the CI building of the editor's copy does not have a UTF-8 capable locale by default. That breaks all kind of stuff. @martinthomson might want to fix that.

martinthomson commented 3 years ago

https://github.com/martinthomson/i-d-template/issues/254

I know how to do this, but will need a little time to sort it out.

cabo commented 3 years ago

When you do this, you might want to update kramdown-rfc2629 as well.

martinthomson commented 3 years ago

That part is automatic.

larseggert commented 3 years ago

You do this https://github.com/NTAP/rfc8312bis/blob/pretty-math/Makefile#L16-L17 and then https://github.com/NTAP/rfc8312bis/blob/pretty-math/.github/workflows/ghpages.yml#L17-L18

At least that worked for me.

larseggert commented 3 years ago

@martinthomson would you be open to a PR for i-d-template to add svgcheck, tex2svg, asciitex, etc. to the docker image? Or should we install them manually as needed?

larseggert commented 3 years ago

You'll need a workaround for ... until that is fixed, e.g.

{{{β}{}}}<sub>cubic</sub>:
 CUBIC multiplication decrease factor as described in {{mult-dec}}

@cabo that workaround doesn't seem to do anything?