AST for the content - Githubissues

Andrew-Morozko commented 4 weeks ago

Right now each content block produces markdown strings, which requires a lot of string manipulation. For example, the title block replaces newlines in its value, otherwise, the title would be broken:

# Tiltle with
newline is half-broken

We should use an AST structure closely resembling markdown and use it instead:

func Title(size int, text string) ContentNode
func Text(text string) ContentNode
func List(kind ListKind, items [] ContentNode) ContentNode
// And so on

And use it to avoid repeated validation:

List(Unordered, []ContentNode{Text("list item, but it can contain\n * and not be split into two")})

This also decouples us from markdown, making it possible for data sinks to render a document in a custom format (html, latex, or, of course, markdown ...) from our ContentNode structure. Also, this makes the content more easily inspectable and traversable for meta-content providers such as toc.

I considered using goldmark's AST instead of creating our own, but their AST types are assumed to be valid markdown (since they usually come out of their markdown parser), while ours need to be validated/adjusted before we can pass them as a valid markdown.

Since our AST would be based on markdown it would be straightforward to implement ContentNode -> goldmark/ast.Node converter, after which we can use goldmark-markdown to render goldmark AST back into markdown, built-in renderer to produce HTML or goldmark-latex to produce LaTeX code.

traut commented 3 weeks ago

+1 on using goldmark's AST, seems like a nice data model (better than plain Markdown strings) we can build upon

dobarx commented 3 weeks ago

@traut -1 on using goldmark's AST. Like @Andrew-Morozko mentioned their AST is strictly based on markdown. If we used their AST directly it's even more complicated than using markdown strings all together and would be harder to extend with nodes that are specific for other outputs, like html or pdf.

traut commented 3 weeks ago

Hmm, okay. Do you have any suggestions on what data model to use here for the content trees? I would love to use an existing one instead of building one, and if it comes from the lib we'll be using -- it's a bonus.

Markdown strings on their own are quite versatile, being a serialized version of a data model -- whatever the plugin returns, we can parse and adjust.

dobarx commented 3 weeks ago

Yes, markdown strings are versatile but it creates a problem that we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

Using existing library AST for this might be also tricky since we would need to encode/decode this tree using protobufs and it would be pretty hard to extend it since the focus for that tree is just one output - Markdown. But that doesn't work well since we also have a concept of section which doesn't render into a specific markdown element but can effect markdown content inside.

If we want to extend it then it's best just to make a copy of their AST tree nodes that we could convert to goldmark's AST when rendering markdown. We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

traut commented 3 weeks ago

we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

I like that this is a plugin's problem — Fabric expects a valid Markdown string, so the plugin can do whatever it wants to produce it. It does limit us, though, and is a ticking trouble :) It's too flexible, so the validation of the content is difficult (what if the plugin returns a huge, complex Markdown that will mess up the whole document?) but I'm not too worried about that just yet.

We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

I agree, I think you are right. Our model can be more straightforward and should not be tied to Markdown explicitly -- we're going to support other serializations anyway. On that note, we can use our template data model as an inspiration for our content data model, adopting a concept of block with a content type. Even if it's just to wrap a markdown strings into nodes with metadata - it will still be very useful, I think

blackstork-io / fabric

AST for the content #159