Open sandhose opened 7 years ago
@sandhose in https://github.com/Khan/react-components/blob/master/js/tex.jsx we use dangerouslySetInnerHTML
to side step this issue in React
. I'm not sure if Vue.js has a similar mechanism or not.
I'm not using Vue.js in this case, but the issue is quite the same. You could avoid using dangerouslySetInnerHTML
by using React.createElement
(with a small wrapper) as an hyperscript function to the renderer. By doing this, you're creating real React elements, and you take advantage of React's virtual dom.
In my case, the output of the markdown engine is in one stage represented by a HAST
node (HAST
is a HTML AST).
The way the plugin that parses the math elements in the markdown works, is that it passes the raw math input to KaTeX through the renderToString
function, and then uses rehype-parse
to parse the HTML string to a HAST node.
This parse is heavy, and the parser itself adds ~250Kb to the bundle.
It could be avoided by directly transforming the KaTeX tree to an HAST tree, but buildTree
doesn't seem to be publicly exported, and thus has no stability guarantee.
I'm proposing here to provide a renderHyperscript
function that renders using the provided hyperscript function. This would totally cover my case (because there's hastscript
that provides a hyperscript function to create a HAST node), and would be beneficial in frameworks with some kind of virtual dom like React or Vue.js (they all provide a hyperscript-like API, either officially or by the community)
@sandhose I think you should be able to use the output of buildHTML
and use that to generate a HAST
tree. I had a look at https://github.com/syntax-tree/hast and it may make sense to modify buildHTML
to produce a HAST
tree. I don't see any reason for us to have our own HTML tree structure when there a more standard one exists.
It might be a good idea (and @wooorm would be pleased). I'll try to do something.
FYI I have a branch that implements hyperscript rendering, with a react example here (built version here)
Well, it just got a lot harder with #807 because it means I have to re-parse the innerHTML
in spans to render them (and it is expensive).
@sandhose could we not add a innerHTML
property which is a string? https://github.com/syntax-tree/hast#properties seems to indicate that both attributes and properties.
it just got a lot harder with #807
Would it help if we wrapped every <svg>
with a span with a descriptive class, as in:
<span class="rightarrow"><svg>...</svg></span>
Or perhaps write the class into the SVG?
<svg class="rightarrow">...</svg>
@sandhose if you're not modifying the nodes within the SVG is there really any benefit from the SVG being described as virtual DOM nodes as opposed to a string?
@kevinbarabash Hi! đź‘‹
could we not add a innerHTML property which is a string? https://github.com/syntax-tree/hast#properties seems to indicate that both attributes and properties.
HAST is for HTML, so think of it as only “attributes” being supported, not DOM properties like the innerHTML
setter.
I suggest against using raw HTML inside a virtual DOM, for the same reasons that React uses the name dangerouslySetInnerHTML
— it’s dangerous and slow.
I know it’s not always possible, but using an object structure (like HAST, or your own) instead of building strings makes things great for non-server-side rendering!
@wooorm good point about perf during client-side renders. @ronkok what are your thoughts on build an AST for the inline SVG bits and then using createElement
and appendChild
to render them?
I'd want to try HAST format for the non-SVG parts of the tree first and see how that goes before putting in the effort convert the SVG parts.
@kevinbarabash I'm all in favor of what you suggest. I may not be the best person to implement it. Let me look into it and get back to you.
@sandhose some of our nodes output document fragments. Is there a way to model that with HAST? Would it just be an array of HAST nodes?
@kevinbarabash Yup! You can opt for an array of nodes.
Or if you’d like, returning a root
node ({type: 'root', children: [...]}
) is also fine, but root
nodes shouldn’t be inserted somewhere else in a tree (only the top node may be a root
)
It would be awesome to have some kind of public API to build vnode trees. Right now I'm doing:
Katex tostring => fast HTML parser => build vnode tree in MithrilJS with hyperscript calls (recursively walk the three).
Is there a way I could skip the parsing? Should I look internally for buildTree and modify the katex source or is this feature coming soon?
@cjh9 If you don't need MathML, you could probably call buildHTML
directly; if you want both, buildTree
would be good. If you can successfully import them (#954 might get in your way, but we'd appreciate a fix to that), then they should just work, and return the existing internal node tree data structures.
If they're helpful, I don't see any reason not to expose buildTree
, buildHTML
, and buildMathML
in katex.js
's module.exports
, presumably prefixed with __
to make it just as scary/unsupported as __parse
(though these methods are currently probably more stable than __parse
). Any objections?
@edemaine Sorry for late response, yes that would be awesome if they could be exposed in the distribution! In what format would __buildHTML return the tree, in real HTML-nodes or more light-weight Json format? Only the later seem to integrate well with webworkers..
@cjh9 This is now available in the master branch, thanks to #1017. The nodes are returned in a custom nested Javascript data structure (objects containing children array fields). Hope that helps!
@edemaine Super great! 🎉 And it is also serializable to JSON :) Would it also be possible to expose __bulidTreeHTML if I don't need MathML? Not super important though I can work around It :)
@cjh9 I'm guessing you already know this, but you should be able to leverage buildTreeHTML via the buildTree. Would that work? Exposing just the HTML would probably require extracting some of the default options (i.e. https://github.com/Khan/KaTeX/blob/master/src/buildTree.js#L19)
I think pure HTML export makes sense when you're rendering it in a custom way (e.g. SVG), as you're unlikely to also be able to include MathML in that setting. I could see either
Settings
to Options
conversion in buildTree
so that we can write a new buildHTMLTree
, orSettings
to prevent MathML creation.Thoughts?
Hmm, I think 1. sounds good. My concern with 2. is how that would affect items downstream (i.e. the buildTree). Perhaps I don't understand it well enough though.
@cjh9 The master branch now has (via #1022) __renderToHTMLTree
that outputs just the HTML part. Also, we renamed the method you were using to __renderToDomTree
for more consistent naming. Hope this helps!
@edemaine Sorry for late reply, super great! You guys are awesome :D
After investigating the hast
format some more I've concluded that it's not appropriate for our use case, in particular:
class
and style
as simple string attributesI would like to simplify our current in memory HTML objects to be plain objects instead of classes, but I think that storing classes
as an array and styles
as an object is a superior especially for checking for the presence of particular styles or CSS classes or for modifying those.
After we refactor those objects (and extract non-HTML props into an intermediate representation) they should be stable enough (and simple enough) that writing a translator from our HTML objects to hast should be trivial.
This is not entirely true: fragments can be stored in a root too, and className is an array! Finally, style could be discussed. It used to be an object in fact, and could be mapped to that again, pending further discussion.
@wooorm thanks for pointing out className
. I should've read the section on "Property values" more closely.
There’s no special format for style.
What does that mean? Is it a string or an object?
Are there any examples of how fragments are dealt with? We'll never return a fragment so fragment support isn't a deal breaker.
There’s no special format for style.
What does that mean? Is it a string or an object?
It used to be under discussion, but removed in 2016. There are downsides to doing style as an object, because you need to parse styles in some cases, which includes quite the library. In other cases, you need to stringify it, which is less of a problem.
Are there any examples of how fragments are dealt with? We'll never return a fragment so fragment support isn't a deal breaker.
Any document, whether it’s a complete one or a fragment, is stored in a root node. There’s no other handling for it. To be honest, now I’m not entirely sure what your use case is!
@kevinbarabash Is there still interest in doing this? Are there reservations?
If so, I may be able to work on it the coming weeks. Could you estimate the time involved with changing the underlying objects to a different format?
Hi,
We're doing a markdown engine using
remark
with therehype-katex
plugin. The thing is,rehype-katex
is using KaTeX'srenderToString
function, and parses the output usingrehype-parse
(which usesparse5
to parse the HTML string into HAST,rehype
's AST).This is really inefficient (haven't done benchmarks, but it adds ~250Kb to the bundle), and could be improved by allowing to render to a hyperscript like function. By doing this, it would allow to render far more efficiently KaTeX into frameworks like React or Vue.js.
I'll write a proof of concept today ; I might not have the time to write the doc nor tests though.
I know there's a lot of links up here because of our use case, but TL;DR: it would be nice to be able to render KaTeX directly through a hyperscript like function.