dillonkearns / elm-markdown

Extensible markdown parser with custom rendering, in pure Elm.
https://package.elm-lang.org/packages/dillonkearns/elm-markdown/latest/
BSD 3-Clause "New" or "Revised" License
106 stars 32 forks source link

Render to markdown #48

Open hoichi opened 4 years ago

hoichi commented 4 years ago

Suppose I want to manipulate markdown AST and export the result as markdown. Am I correct to understand there’s no way to convert AST to markdown other than writing a custom renderer?

Do you, by any chance, plan to include rendering to markdown in your package? As stable as the GFM spec is, it would be nice to have deserialization and serialization knowledge in the same package.

matheus23 commented 4 years ago

Hi! I'm interested in such things. Here's some questions I have:

I'd be happy to explore this design space with some implementation, too.

hoichi commented 4 years ago

My use case is, I want a WYSIWIG Markdown editing with elm-rte-toolkit. Also, search (and probably replace) across notes/files and maybe some other manipulations. The audience is, first and foremost, non-technical.

I haven’t thought of the diff, but just of the top of my head, some thoughts:

So at this point, my guess is, no, as long as the output is not complete garbage, minimal diffs are not that important.


But speaking of two-way conversions, it just occurred to me that the ability to convert both ways inside the same package makes it possible to write some tests that convert to AST and back and check for losses. I wonder if you can generate the full range of markdown AST with fuzzers. :thinking:

dillonkearns commented 4 years ago

Hello @hoichi, thank you for the conversation, and thanks for sharing some background!

Yes, having some built-in functions to convert the Markdown blocks into markdown (without a custom renderer) is very much in line with the project goals. It's been on my mind, and I think it's a no-brainer to include it.

It's more a question of how we want to implement it, and then taking the time to do it. @matheus23, if you're interested in working on this, that would be incredible! Thanks for that, and thanks for chiming in on the conversation. I'm here if you want to pair on this or discuss it at any point!

A few thoughts. I think the minimal version of this is fairly simple. It's very helpful to get the context of your use case, though, because having more complicated needs is what would make this very complicated.

Right now, we could pretty easily turn something like this:

[ Block.Heading Block.H1 ([ Block.Text "Heading" ])
, Block.Paragraph [ Block.Text "Paragraph 1" ]
, Block.ThematicBreak
, Block.Paragraph [ Block.Text "Paragraph 2" ]
]

Into something like this

# Heading
Paragraph 1

---

Paragraph 2

The complication would come from needing to preserve the original, concrete syntax. But we're storing an Abstract Syntax Tree (AST), not a Concrete Syntax Tree. So we lose information about things like whitespace, alternative syntax, etc.

For example, the input text could actually have been:

# Heading #
Paragraph  1

 **  * ** * ** * **

Paragraph          2

That's a bit of an arbitrary example. But there are quite a few examples in markdown where very different raw input text results in the exact same AST. So there are a lot of cases like this.

That said, sometimes it's a feature to render it back to equivalent markdown (depending on the use case). It can act as a sort of formatter, as it gives you a "canonical" representation of the AST.

On the other hand, if you prefer to represent lists with -'s rather than *'s, but our canonical representation differs from that preference, then in some use cases a user could be fighting against the tool.

Hybrid Approaches

It's not necessarily as simple as CSTs vs. ASTs. We can store some reasonable data in the AST, for example, we could keep track of which character is used to build up a list. The question then becomes, do we want to allow elm-markdown users to access that information about the AST. Because there are benefits to simple representing the semantics of the markdown, without the syntax. It means that you can't have strange behavior based on syntax that GFM, and elm-markdown, consider to be equivalent (unless you go out of your way and build your own pre-processing code).

Where does that leave us?

Given the use case (storing a format that @hoichi has control over), I think it's best to stick with a strictly abstract syntax tree for now, and not add any additional information to it. Then let's see what we can do to build a nice function (or set of functions) to convert Markdown.Block.Blocks into a markdown String! 😁

hoichi commented 4 years ago

Hello @dillonkearns. Thanks to you for all the insights and, of course, thanks for your work.

Again, for my use case, preserving the markdown code style is not really important, especially on a per-file, leave alone per-element, basis. First of all, my project is WYSIWYG-first, with markdown backend for interoperability, and to prevent vendor lock-in. I can relate to being particular about how do you format your unordered lists or italics when you use a code editor, but my users shouldn’t normally see the markdown.

Per-file code style

Secondly, I’m not sure style consistency should be preserved on a per-element basis. The lowest level that makes sense to me is a file (why would you want _italic_ and *italic* in the same file?).

Per-project code style

Moreover, since my app should have multiple files per project (think Evernote, for example), so maybe markdown style should be consistent across all of those files. And since the content should be normally created by my app, divining the markdown code style from the source is not that useful. What might make sense is setting the markdown code style in an app/project settings, but that would be some very advanced settings. I don’t think the majority of the users should see those settings at all.

So, my priorities

  1. Serializing AST to markdown — crucial.
  2. Configuring the output code style — nice, but can totally wait.
  3. Detecting/preserving code style on for single files or elements — non-goal.

So yeah, I totally agree with the strictly abstract approach 😁

P.S. Speaking of hybrid approaches

Apps like Typora have a hybrid approach of their own: they're mostly WYSIWIG, but you can input, or even edit the current element as markdown (you focus on the italic text, underscores or asterisks appear). That might mean having to rely on CST, but nice as it would be for some power users, it may add a lot of complexity (both for my app and probably for elm-markdown), so I’m not really sure about this feature so far 😬

dillonkearns commented 4 years ago

Great background, thanks @hoichi! Yeah, having a limited configuration in a future iteration could be a way to allow for custom formatting without having a CST. Sticking with a more abstract representation of the block structure definitely feels like the right approach to me.

If you consider the experience with elm-format, too, it's quite nice in some cases to have a canonical representation that you don't have to think about.

I think for the first pass, it would be nice to look at a few markdown linters, prettier rules, etc., and see what their opinions are on some of these things, and just start with those in an opinionated way.

For example, I think that a lot of linters/prettifying tools prefer - over * syntax for lists.

matheus23 commented 4 years ago

There's many ways of doing this - anyhow, I implemented one of these for fun: https://5ebaeb423003da0006240b8a--elm-markdown-transforms.netlify.app/format-markdown.html

I've used my elm-markdown-transforms library, but it's pretty easy to translate this to the basic elm-markdown Markdown.Renderer.Renderer. You can see the code for it here: https://github.com/matheus23/elm-markdown-transforms/blob/format-markdown/src/Markdown/Scaffolded.elm#L916-L1055

dillonkearns commented 4 years ago

That's fantastic, love it! ❤️

I definitely think this makes sense to include in the core package.

Do you think it makes sense to expose this as a Renderer, like Markdown.Renderer.defaultHtmlRenderer? Or as something like this:

Markdown.Block.blocksToMarkdownString : List Block -> String

There are benefits to both, I suppose. One thought is that we probably don't ever want it to fail, we want you to always be able to take List Block -> String. And for the HTML, it should just be a passthrough. So maybe the latter is better (a function in the Block module rather than a Renderer). Thoughts?

hoichi commented 4 years ago

I’m pretty sure I don’t understand all the design decisions behind Renderer*, but it seems a little too low-level for the task. So, yeah blocksToMarkdown probably shouldn’t be more complicated than Config -> LIst Block -> String, if that.

@matheus23 Seems like you have it nailed already. Looks nice—or maybe I can’t get over how nice Elm is :grinning:

Edit: * In the broadest terms though, I guess the raison d’être of Renderers is to give consumers the ability to render to anything, without presuming any knowledge of what that might be. But as for markdown, elm-markdown should know a thing or two about that.

matheus23 commented 4 years ago

Rendering markdown to a string made it into elm-markdown-transforms. I recommend taking a look at the example source code to find out how to use elm-markdown-transforms with elm-markdown.


Do you think it makes sense to expose this as a Renderer, like Markdown.Renderer.defaultHtmlRenderer?

Yeah, maybe? Right now, the prettyprinter is fairly arbitrary though. It might be worth adapting it to a formatting standard before, similar to defaultHtmlRenderer.

One thought is that we probably don't ever want it to fail, we want you to always be able to take List Block -> String.

Yeah, absolutely. It's just really convenient for the person writing the prettyprinter to use the Renderer API, but it's unfortunate that the pretty printer returns a Result.

hoichi commented 4 years ago

@matheus23 Terrific, thanks a ton!

As for the Renderer/Result things, maybe it’s worth considering using the renderer interface internally if that simplifies code reuse, but expose a List Block -> String interface to consumers. As I said, I think converting to markdown is a special case, because markdown is a backend format, as opposed to HTML and whatnot, and the lib should probably encapsulate the knowledge of it.

Or do you expect consumers to want to customize the markdown rendering process somehow, so the renderer should be pluggable?

matheus23 commented 4 years ago

Or do you expect consumers to want to customize the markdown rendering process somehow, so the renderer should be pluggable?

Yeah, I do that a lot. Things like custom rendering of code blocks, images or HTML within markdown.

expose a List Block -> String interface to consumers.

I'm holding back on doing that, because I'm experimenting with a datastructure that could replace the List Block and the Renderer at the same time. That might take a while, though.

alltonp commented 3 years ago

Hi,

I'm really interested in rendering into markdown too!

Lots of interesting discussion above, just wondering, did this ever get added to the core code?

My use case: I'm running 'commands' against a piece of markdown and want to have the commands change the markdown. For example a command might be to add a line to a table, or to move the value one cell left/right.

FWIW, I'm slightly less worried about preserving the original spacing etc, just as long as all blocks remain and nothing is lost.

Many thanks, Paul