RobertDober / earmark_parser

The Markdown to AST part of Earmark.
Apache License 2.0
68 stars 26 forks source link

Replacing node bodies help #104

Closed crertel closed 2 years ago

crertel commented 2 years ago

Hello! I've got a bit of code like this:

# Function to read in a post and convert its markdown
# should also apply a transform to convert all non-inline code blocks to "XXX"
def parse_post(post_path) do
    raw_post_text = pull_in_markdown(post_path)

    options = [
      registered_processors: [
        {"code", &render_code_node/1}
      ]
    ]

    {:ok, post_html, _} = Earmark.as_html(raw_post_text, options)

    post_html
  end

  # our toy code block redaction function
  def render_code_node({"code", attrs, body, meta} = node) do
    classes = Earmark.AstTools.find_att_in_node(node, "class") || ""

    cond do
      classes == "" -> node
      classes =~ "inline" -> node
      true -> {"code", attrs, "xxx", meta}
    end
  end

Now, when I run this, it correctly runs the transformer (at least as indicated by some IO.inspect debugging), but there seems to be no way to modify the body of the node.

I suspect this is because we don't support these transformers allowing for anything but attribute and maybe tag changes, but I don't know that for sure.

So, how could I accomplish this? I already do something similar using Floki on the generated HTML, but I was really hoping the AST/transformer APIs would let me drop that kludge.

Thanks in advance!

RobertDober commented 2 years ago

You are in the wrong repo, this is concerning https://github.com/pragdave/earmark, but no big deal I manage both :stuck_out_tongue:

Hopefully I can check this during the WE

RobertDober commented 2 years ago

and with the little time I have I might be wrong, but is this not related to https://elixirforum.com/t/how-to-extend-earmark/47406 ?

RobertDober commented 2 years ago

and yet another quick remark should that not be:

true -> {"code", attrs, ["xxx"], meta}

body's a list not a string IIRC ;)

crertel commented 2 years ago

Thank you for your assistance! I looked at that forum entry, it wasn't quite what I needed but was helpful.

So, I tried with:

    {:ok, ast, _} = EarmarkParser.as_ast(raw_post_text)
    transformed_ast = Earmark.Transform.map_ast(ast, &render_code_node/1)
    post_html = Earmark.Transform.transform(transformed_ast)
And thhen defined `render_code_node/1` as:

```elixir
  def render_code_node({"code", attrs, body, meta} = node) do
classes = Earmark.AstTools.find_att_in_node(node, "class") || ""

cond do
  classes == "" ->
    {"p", [], ["no code"], meta}

  classes =~ "inline" ->
    {"p", [], ["no code"], meta}

  true ->
    {"p", [{"we", "tried"}], ["no code"], meta}
end
|> IO.inspect(label: "CODE")

end

def render_code_node(node), do: node


  This works just fine except that the contents/bodies of the code blocks aren't being replaced. I see things like:

def clean_html(html) do html |> String.replace(~r/<p>/i,"") |> String.replace(~r/<\/p>/i, "") |> String.replace(~r/<dt>/i,"") |> String.replace(~r/<hr>/i,"") |> String.replace(~r/icon=".*?"/i,"") |> String.replace(~r/icon_uri=".*?"/i,"") |> String.replace(~r/add_date=".*?"/i,"") |> String.replace(~r/last_modified=".*?"/i,"") end


The element changing from `<code>` to `<p>` is correct, the attributes getting set to `we="tried"` is correct, but somehow the body of the code block is not changed (it should be `no code` instead of actual code)--and by extension, it looks like I wouldn't be able to replace the body with custom HTML from a syntax highlighter.

I *think* what's happening is that the tree traversal is still using the values of child nodes in the body, perhaps, instead of accepting whatever the transformer is saying the parent node should have?

Let me know if that helps at all.
RobertDober commented 2 years ago

I see this is by design as indicated here

https://github.com/pragdave/earmark#map_ast

and as explained in the Elixir Forum structural changes are not supported yet as documented here

https://github.com/pragdave/earmark#structure-modifying-transformers

I completely see why this would be interesting and I opened a ticket to implement your simple use case https://github.com/pragdave/earmark/issues/446

RobertDober commented 2 years ago

@crertel release 1.4.25 will take some more testing, polishing and updating the docs, however feel free to install Earmark from this tag https://github.com/pragdave/earmark/tree/v1.4.25-pre and use this test as a guideline https://github.com/pragdave/earmark/blob/v1.4.25-pre/test/acceptance/earmark/postprocessor_test.exs

NOTE

the usage of the {:replace, ...} in the transformer avoids recursive transformation of the replacement ast, without this, endless recursions become a headache in transformer implementations, and that is why content was never replaced, needs a prominent explanation in the docs, I am not the best documentator in the world (and as so often we are not very good at what we like to do, I really love to document :sob:)

crertel commented 2 years ago

This trick seems to work; your solution with {:replace...} was honestly exactly what I was going to suggest if we'd gone another round on this. Great work!