Closed j2kun closed 10 months ago
Great start! I've got 26 test cases. All of these pass when using a shortcode[^1], but 10 of them fail when using the passthrough approach.
[^1]: Tested using both KaTeX and MathJax.
" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample2(t *testing.T) { input := `x = {-b \pm \sqrt{b^2-4ac} \over 2a} n$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample3(t *testing.T) { input := `Inline $x = {-b \pm \sqrt{b^2-4ac} \over 2a} $ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample4(t *testing.T) { input := `Inline $ x = {-b \pm \sqrt{b^2-4ac} \over 2a}$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample5(t *testing.T) { input := `Block $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample6(t *testing.T) { input := `Block $$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample7(t *testing.T) { input := `Block $$x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample8(t *testing.T) { input := `Block $$ x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample9(t *testing.T) { input := `Inline $a^*=x-b^*$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample10(t *testing.T) { input := `Inline $ a^*=x-b^* $ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample11(t *testing.T) { input := `Inline $a^*=x-b^* $ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample12(t *testing.T) { input := `Inline $ a^*=x-b^*$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample13(t *testing.T) { input := `Block $$a^*=x-b^*$$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample14(t *testing.T) { input := `Block $$ a^*=x-b^* $$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample15(t *testing.T) { input := `Block $$a^*=x-b^* $$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample16(t *testing.T) { input := `Block $$ a^*=x-b^*$$ equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample17(t *testing.T) { input := `Inline \(a^*=x-b^*\) equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample18(t *testing.T) { input := `Inline \( a^*=x-b^* \) equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample19(t *testing.T) { input := `Inline \(a^*=x-b^* \) equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample20(t *testing.T) { input := `Inline \( a^*=x-b^*\) equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample21(t *testing.T) { input := `Block \[a^*=x-b^*\] equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample22(t *testing.T) { input := `Block \[ a^*=x-b^* \] equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample23(t *testing.T) { input := `Block \[a^*=x-b^* \] equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample24(t *testing.T) { input := `Block \[ a^*=x-b^*\] equation` expected := "" + input + "
" actual := Parse(t, input) c := qt.New(t) c.Skip() // TODO failing test c.Assert(actual, qt.Equals, expected) } func TestExample25(t *testing.T) { input := `$$ \begin{array} {lcl} L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2 \rightarrow x_1 \rightarrow x_0)G(x_1 \longleftrightarrow x_2)f_r(x_3 \rightarrow x_2 \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2} \\\\\\ &=& \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1} \rightarrow x_i \rightarrow x_{i-1})G(x_i \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k \longleftrightarrow x_{k-1})L_e(x_k \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)}) \end{array} $$` expected := input actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } func TestExample26(t *testing.T) { input := `\[ \begin{array} {lcl} L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2 \rightarrow x_1 \rightarrow x_0)G(x_1 \longleftrightarrow x_2)f_r(x_3 \rightarrow x_2 \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2} \\\\\\ &=& \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1} \rightarrow x_i \rightarrow x_{i-1})G(x_i \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k \longleftrightarrow x_{k-1})L_e(x_k \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)}) \end{array} \]` expected := input actual := Parse(t, input) c := qt.New(t) c.Assert(actual, qt.Equals, expected) } ```I've added c.Skip() // TODO failing test
to each of the 10 failing tests.
To me, the gold standard for testing is the method that Hugo users have been using for years: wrap the LaTeX in a shortcode that passes .Inner
through without modification.
So I think we need all 26 test cases to pass.
I've update the test repo, splitting the passthrough and shortcode tests into separate pages. Each page performs the same tests, in the same sequence, with the same name... so it's easy to compare results.
At some point we'll need some false positive tests as well, if for no other reason than to provide guidance to content authors (i.e., what not to do, when to escape delimiters, etc.).
As to the failing CI build, we check that the go fmt
is OK. I recommend setting up that in the editor you choose, if possible, then you don't have to think about it.
Sorry for the delays, inclement weather and all. I have some simplifications and fixes incoming that handles all of @jmooring's tests, with the main remaining obstacle being that goldmark's "raw text" renderer still does character substitutions like $ -> $amp;
for inline, so I have to add a custom inline renderer. Should be updated here tonight or tomorrow.
All the tests are now passing, with one caveat. I was struggling to get a proper inline and block parser working separately with some of the test cases (e.g., Example6 would trigger a new block on the third line, and I couldn't get goldmark to treat it as a continuation of the existing block).
My workaround was to implement the entire extension using only inline parsing/rendering. As a result:
PassthroughInline
children of a Paragraph
block.$$
vs $$ $$
), the more specific delimiter pair needs to come first in the configured list. I left a comment in passthrough_test::buildTestParser
to this effect, but we'll need to make that clear in the hugo documentation for this feature.implement the entire extension using only inline parsing/rendering
If I understand this correctly, it will prohibit us from creating future, separate render hooks for inline and block pass-through as mentioned here. The inline folks will say, "I want to be able to wrap this with a span so I can mouseover...". The block folks will say, "I want all blocks to be wrapped in a figure element." So, we'd need separate render hooks.
There is only one set of delimiters
Can you briefly describe this?
implement the entire extension using only inline parsing/rendering
If I understand this correctly, it will prohibit us from creating future, separate render hooks for inline and block pass-through as mentioned here. The inline folks will say, "I want to be able to wrap this with a span so I can mouseover...". The block folks will say, "I want all blocks to be wrapped in a figure element." So, we'd need separate render hooks.
I don't recall seeing this contention anywhere, but one possibility would be to add a configuration option to render the passthrough with any user-configured fences (e.g., a hugo shortcode), which could then be transformed by other hugo mechanisms (sorry, not super familiar with what that might be; are shortcodes processed before or after goldmark?). That option could be specific to each delimiter pair, so block delimiters could get a different rendered fence than inline ones. In that direction, would the hugo team prefer for the render hooks to be hugo-based or goldmark-based? But I can see the issue that if these are nested within a paragraph tag, it might be tricky to lift out.
I could give the block+inline approach another go, but another problem was that I couldn't even get the block parser to trigger in the case that the block delimiters were used inline. From what I recall, there was something weird about how goldmark handles lines starting with punctuation in re separating blocks, and I couldn't quite get to the bottom of it.
There is only one set of delimiters
Can you briefly describe this?
Instead of having two configuration options, one to specify block delimiters and one to specify inline delimiters, there is just one option to configure all delimiters, and they are all implicitly inline.
One other test I just thought of was to handle $
delimiter escapes. I'll have to think about how to handle that generically. Maybe: ensure \
is always a trigger, then if Parse is triggered by a \
, check to see if any single-byte delimiter follows the backslash, and if so advance the reader and return a text segment for the escaped delimiter. (Multi-byte delimiters would not be escapable)
render hooks
This is a Hugo thing, not a Goldmark thing. We currently have render hooks for images and links (both are inline elements), and headings and fenced code blocks (both are block elements). But the Goldmark extension should be structured such that, in the future, we can easily hook into both inline pass-throughs and block pass-throughs. Shortcodes are not relevant to the either the Goldmark extension or render hooks.
Instead of having two configuration options, one to specify block delimiters and one to specify inline delimiters, there is just one option to configure all delimiters
I think this is consistent with the Hugo configuration described here:
markup:
goldmark:
extensions:
passthrough: # or a better name
enable: false
delimiters:
- ['$','$'] # inline equations
- ['$$','$$'] # block equations
- ['\(','\)'] # inline equations
- ['\[','\]'] # block equations
and they are all implicitly inline.
Not sure I understand what you mean by this.
I had to modify two of @jmooring's tests
OK, but then we're not testing against what users currently do/get (shortcode).
Not sure I understand what you mean by this.
I don't think it's possible to separate the block/inline parsers and have a single set of configured delimiters. You have to separate them to say which to trigger parsing a block and which to trigger parsing inline, unless the two sets are identical.
But the Goldmark extension should be structured such that, in the future, we can easily hook into both inline pass-throughs and block pass-throughs.
I think you're saying one would (during hugo's configuration of the goldmark parser) override the renderer I wrote in this PR, and use some other renderer for the parsed nodes, right? As is, such a renderer would emit HTML that is nested inside a paragraph tag. Am I understanding you that this is not acceptable? I could also attach the parsed delimiters to the ast node, so that the behavior can branch based on the matched delimiters.
In any case, a lot of this nuance focuses on how Foo $$x$$ bar
should be treated, since the display-mode ("block") equation is parsed inline. Would the users complain that this is not handled by a block renderer? My main issues with the block renderer were in trying to get it to emit the above–and its split-across-lines variations–as a block node. If we're happy to emit this as an inline node, but still emit Foo\n\n$$x$$\n\nbar
as a block node, then I can add back the block renderer quite easily. But it might confuse users expecting a custom block rendering hook to work on both the inline and block uses of the same delimiters.
Instead of having two configuration options, one to specify block delimiters and one to specify inline delimiters, there is just one option to configure all delimiters
OK, I think I get this now. The original (my) configuration model was inadequate because it would require delimiter hardcoding in the extension. And we don't want to hardcode to LaTeX delims; this is a generic passthrough extension. (e.g., "I want to do inline passthrough of text that starts with "xxx" and ends with "yyy").
So, the proposed Hugo config should have been:
markup:
goldmark:
extensions:
passthrough:
enable: false
delimitersInline:
- ['$','$']
- ['\(','\)']
delimitersBlock:
- ['$$','$$']
- ['\[','\]']
or maybe this is better (we can figure this out on the Hugo side later)
markup:
goldmark:
extensions:
passthrough:
enable: false
delimiters:
inline:
- ['$','$']
- ['\(','\)']
block:
- ['$$','$$']
- ['\[','\]']
Would the users complain that this is not handled by a block renderer?
I would, for the following reasons:
$$
is a block delimiter, it should be... a block delimiterRegarding the first reason... when this is released content authors might ask, "Can I remove all the {{< math >}}
and {{< /math >}}
stuff in my markdown?" Answering with "it depends" isn't great.
If you haven't used them, I recommend trying Hugo's render hooks to see how they behave. What they render completely replaces the markdown without wrapping them in something else. That's true for both inline and block hooks.
Are there any standard markdown syntax forms that allow an entity specified inline to split into a new block?
Code fences do not: https://spec.commonmark.org/dingus/?text=Inline%20%60%60%60code%60%60%60%20equation
I worry this may be a limitation of goldmark, though I would have to pinpoint the specific reason. If the same fences render differently in the two contexts for other markdown concepts, it might make more sense to handle the two cases separately, and only emit a passthrough block if the delimiters wrap the content on separate lines, as in the case of code fences.
Are there any standard markdown syntax forms that allow an entity specified inline to split into a new block?
No.
only emit a passthrough block if the delimiters wrap the content on separate lines
This would not be compatible with:
And the last two are about portability, the argument/justification that turned this effort from a proposal into an accepted enhancement.
In your example, GitHub renders the block equations inside a <p>
tag, suggesting it is parsing it as an inline node, while GitLab renders it inside a <div>
. Moreover, GitHub doesn't support the examples that are confounding my work in Goldmark: https://gist.github.com/j2kun/d2b02918441bd255de07ae711e05047a
I don't have a GitLab account so I can't check if that example also fails on GitLab, but based on your inline examples on GitLab it likely fails.
I think we're going to have to compromise on something here. Which of the block equation examples are we willing to give up? I think it would be important to keep:
$$
equation
$$
and
This is an $$equation$$ sentence.
But not support split-line block equations happening inline.
This is an $$equation
foo
bar$$ baz.
I think I could make that work with a block parser, but if the priority is, "Can I remove all the {{< math >}}
and {{< /math >}}
stuff in my markdown?" then the current implementation will get us closest to answering "yes" because it supports all the syntax forms, and they don't care about custom renderers because the JS libraries are doing that. If the priority is to support stricter consistency in future potential overrides to the passthrough renderer, then I can try to make that work.
I think we're going to have to compromise on something here... I think I could make that work with a block parser.
OK, we can document that split-line block equations are not supported, as long as the other two block forms work.
If the priority is to support stricter consistency in future potential overrides to the passthrough renderer
Yes, that one.
Question: will inline support all three forms (same line, separate line, split line)?
Inline supports everything, that's what I pushed to the PR. I will give it a go to see if I can get block parsing support for the non-split-line block forms.
On Thu, Jan 18, 2024, 4:55 PM Joe Mooring @.***> wrote:
I think we're going to have to compromise on something here... I think I could make that work with a block parser.
OK, we can document that split-line block equations are not supported, as long as the other two block forms work.
If the priority is to support stricter consistency in future potential overrides to the passthrough renderer
Yes, that one.
Question: will inline support all three forms (same line, separate line, split line)?
— Reply to this email directly, view it on GitHub https://github.com/gohugoio/hugo-goldmark-extensions/pull/2#issuecomment-1899461001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS2PKX4EAXLAOYS7EJO44TYPHABLAVCNFSM6AAAAABB2ZITMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZGQ3DCMBQGE . You are receiving this because you authored the thread.Message ID: @.***>
Good news, after quite a bit more digging I found a workable solution to parse all desired syntax forms and support block delimiters expressed inline with a block renderer.
The solution was to use Goldmark's concept of an ASTTransformer
, which is a hook to arbitrarily post-processes the parsed AST before moving on. In my case, I had the parsing stage parse all the passthroughs as inline, and then I search for inline passthrough nodes whose delimiters are block delimiters, and then I split the paragraph at that node to emit a passthrough block node and two paragraphs on each side.
However, this does change some of the test cases, since if we're properly emitting a block node, then we have to have a paragraph break in the rendered HTML. I.e., in the tests that @jmooring defined like
Block $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$ equation
The result is
<p>Block </p>
$$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$
<p> equation</p>
Which differs from the originally written expected output of
<p>Block $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$ equation</p>
It also made me realize that some strange edge cases, such as using a block delimiter inside a list item, should probably remain as inline nodes.
I still have to implement the escape idea described in https://github.com/gohugoio/hugo-goldmark-extensions/pull/2#issuecomment-1899008617, but in the meantime please let me know if the above tweak to the test cases is acceptable.
@j2kun
Looking good!
First, a couple of inconsequential edits to the test cases... I missed a line when creating example 2, and the last example has some leading spaces where it shouldn't.
" + input + "
" actual := Parse(t, input) @@ -482,22 +483,22 @@ $$` func TestExample26(t *testing.T) { input := `\[ - \begin{array} {lcl} - L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2 - \rightarrow x_1 - \rightarrow x_0)G(x_1 - \longleftrightarrow x_2)f_r(x_3 - \rightarrow x_2 - \rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2} - \\\\\\ &=& - \prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1} - \rightarrow x_i - \rightarrow x_{i-1})G(x_i - \longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k - \longleftrightarrow x_{k-1})L_e(x_k - \rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)}) - \end{array} - \]` +\begin{array} {lcl} +L(p,w_i) &=& \dfrac{1}{N}\Sigma_{i=1}^N(\underbrace{f_r(x_2 +\rightarrow x_1 +\rightarrow x_0)G(x_1 +\longleftrightarrow x_2)f_r(x_3 +\rightarrow x_2 +\rightarrow x_1)}_{sample\, radiance\, evaluation\, in\, stage2} +\\\\\\ &=& +\prod_{i=3}^{k-1}(\underbrace{\dfrac{f_r(x_{i+1} +\rightarrow x_i +\rightarrow x_{i-1})G(x_i +\longleftrightarrow x_{i-1})}{p_a(x_{i-1})}}_{stored\,in\,vertex\, during\,light\, path\, tracing\, in\, stage1})\dfrac{G(x_k +\longleftrightarrow x_{k-1})L_e(x_k +\rightarrow x_{k-1})}{p_a(x_{k-1})p_a(x_k)}) +\end{array} +\]` expected := input actual := Parse(t, input) ```Second, I don't think we're going to know the impact, if any, of splitting one paragraph into two paragraphs until we get this into the field. I suspect it will be fine.
Third, can you re-purpose https://github.com/gohugoio/hugo/pull/11866 for the Hugo integration bit? I think we should defer asking for bep's review until both pieces are working.
Again, this looks great!
golang newbie question: how should I test it if the test code is in my repository? go get
gives:
module declares its path as: github.com/gohugoio/hugo-goldmark-extensions/passthrough
but was required as: github.com/j2kun/hugo-goldmark-extensions/passthrough
I could update https://github.com/gohugoio/hugo/pull/11866 to use my forks in their go.mod
, and then change them back before merging. Is there a different recommended way that keeps the code identical, but silently points to my fork (or my local git clone)?
https://github.com/gohugoio/hugo/pull/11866 is working, pointing to my local extension in b5ea64e10
Note:
v0.0.1
for the version in go.mod
Another test to add:
I am finding some issues with the integration, will move the discussion to https://github.com/gohugoio/hugo/pull/11866
@jmooring As an practical note from the side: Assuming you got something that works OK (and the tests are green in this repo), I wouldn't mind just merging this PR into the main branch. This should make integration testing with Hugo a lot easier. This repo is unversioned and in progress, so people should know that it's not done yet.
Implementation is ready for review with one caveat: I can't get goldmark to trigger the block parser for inputs where the block fences are placed inline:
Despite having
CanInterruptParagraph
to true in the parser, the block parser is never triggered and I'm not sure why.One workaround for LaTeX is to duplicate all the configured block fences in the config for the inline fences, which actually seems like the more appropriate general behavior: a configured block fence only triggers when it's the start of a new markdown paragraph.
Other than that, all the tests in
https://github.com/jmooring/hugo-testing/blob/hugo-github-issue-10894
seem to pass.