gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 275 forks source link

Rendering markdown (as a way to mutate markdown documents) #777

Closed synthead closed 1 year ago

synthead commented 1 year ago

I am looking to read markdown, mutate the document by working with objects, and render the updated markdown. This will let me update a markdown document without resorting to methods with less introspection, like using regular expressions.

Is this possible with Kramdown?

gettalong commented 1 year ago

Yes, this is possible. kramdown builds an internal AST structure from the source document that is then used by the converters to produce the output. You can manipulate the AST any way you like as long as you don't make it invalid, e.g. by using invalidly nested elements. See https://kramdown.gettalong.org/rdoc/Kramdown/Element.html for details.

synthead commented 1 year ago

@gettalong, thank you for your reply! Would you be so kind as to provide a small code example of how this would be done? From your response, my instinct would be to create a converter plugin that basically renders like-for-like. Is there a simpler way?

gettalong commented 1 year ago

Have a look at

They all work with the created AST. The HTML converter just converts each AST element into its HTML representation, the TOC converter iterates the AST and only works on the :header elements to create another AST with just the table of contents elements, and the HashAST converter creates a nested hash from the AST.

So it depends on what you what to achieve. From what you wrote my guess is that you need to iterate the AST and modify it. That could be done in a simple recursive method or in a dedicated class.

synthead commented 1 year ago

@gettalong, thank you for the examples! Just to start, I'm trying to read Markdown and write Markdown (i.e. not HTML via #to_html) with no changes. This converter could be called Markdown and could be consumed via #to_markdown, for example.

Following the TOC example you mentioned here, it looks like the AST is returned here, which doesn't seem to provide a useful string output on its own. I assume this is so that it can be chained to another converter, i.e. .to_toc.to_html?

If I wanted to create a "no-op" converter, it looks like I can simply do this:

# frozen_string_literal: true

module Kramdown
  module Converter
    class Markdown < Base
      def convert(el)
        el
      end
    end
  end
end

However, it appears that when the Markdown gets loaded into the AST, the original Markdown formatting is lost. To produce Markdown output, I would have to turn parsed links into [markdown](links) from scratch, for example. This could be a pretty hefty converter if I would have to restore all the original Markdown formatting.

This is kind of what I suspected, but I thought I would ask just in case :slightly_smiling_face: Does this sound right?

gettalong commented 1 year ago

Yes, the char-by-char original formatting is not preserved, just the information on the elements and their attributes.

The TOC converter returns a special AST that is not directly usable by a standard converter. It is meant to be post-processed by another tool to the required output AST or format.

Your no-op converter will really do nothing, just returning the input root element.

Btw. if you want to read Markdown and write Markdown and you are fine with kramdown's dialect of Markdown, you can just use the built-in kramdown parser and converter. Try kramdown -o kramdown my_kramdown.md which will convert the input document to the AST and then write a kramdown document from it.

synthead commented 1 year ago

Great! Thank you so much for all the help, @gettalong!