Pretty HTML output - Githubissues

sgraaf commented 1 month ago

Hi there!

I was wondering if there is any way to make Python-Markdown output a "prettified" HTML string (i.e. with proper indentation)?

I have searched the docs (and Google) extensively, but I was unable to find an answer.

For reference, the following markdown string...:

# This is a markdown rendering test

With some _emphasis_ and **boldness**.

We have tables:

| Column 1     | Column 2     |
| ------------ | ------------ |
| Row 1, Col 1 | Row 1, Col 2 |
| Row 2, Col 1 | Row 2, Col 2 |

And lists:

1. wdwd
2. e2e2
3. wdwd

... is converted into the following HTML string...:

<h1>This is a markdown rendering test</h1>
<p>With some <em>emphasis</em> and <strong>boldness</strong>.</p>
<p>We have tables:</p>
<table>
<thead>
<tr>
<th>Column 1</th>
<th>Column 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Row 1, Col 1</td>
<td>Row 1, Col 2</td>
</tr>
<tr>
<td>Row 2, Col 1</td>
<td>Row 2, Col 2</td>
</tr>
</tbody>
</table>
<p>And lists:</p>
<ol>
<li>wdwd</li>
<li>e2e2</li>
<li>wdwd</li>
</ol>

... with:

markdown.markdown(
    markdown_string, extensions=["extra"], output_format="html"
)

maleemjaved commented 1 month ago

Hi,

An extension can be written to make it happen by registering a custom Postprocessors.

from xml.etree import ElementTree as ET

from markdown.extensions import Extension
from markdown.postprocessors import Postprocessor

class PrettifyHTMLPostprocessor(Postprocessor):
    def run(self, text: str) -> str:
        # just to make sure tree has one root otherwise  ET.fromstring will raise error
        modified_text = f"<div>{text}</div>"

        tree = ET.fromstring(text=modified_text)
        ET.indent(tree)
        return ET.tostring(tree, encoding="unicode")

# register it in your markdown class `extensions=["tables", PrettifyHTML()]`
class PrettifyHTML(Extension):
    def extendMarkdown(self, md: Markdown) -> None:
        md.registerExtension(self)
        md.postprocessors.register(PrettifyHTMLPostprocessor(), "html_prettify_postprocessor", 15)

Output:

<div>
  <p>We have tables:</p>
  <table>
    <thead>
      <tr>
        <th>Column 1</th>
        <th>Column 2</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Row 1, Col 1</td>
        <td>Row 1, Col 2</td>
      </tr>
      <tr>
        <td>Row 2, Col 1</td>
        <td>Row 2, Col 2</td>
      </tr>
    </tbody>
  </table>
  <p>And lists:</p>
  <ol>
    <li>wdwd</li>
    <li>e2e2</li>
    <li>wdwd</li>
  </ol>
</div>

sgraaf commented 1 month ago

Thanks, that worked like a charm!

I have adapted the Postprocessor slightly such that the encompassing <div> tag is removed:

from xml.etree import ElementTree as ET

from markdown import Markdown
from markdown.extensions import Extension
from markdown.postprocessors import Postprocessor

class PrettifyHTMLPostprocessor(Postprocessor):
    def run(self, text: str) -> str:
        # just to make sure tree has one root otherwise  ET.fromstring will raise error
        modified_text = f"<div>{text}</div>"

        tree = ET.fromstring(text=modified_text)

        ET.indent(tree)

        indented_text = ET.tostring(tree, encoding="unicode")

        return "\n".join(indented_text.splitlines()[1:-1])  # remove "<div>"

# register it in your markdown class `extensions=["tables", PrettifyHTML()]`
class PrettifyHTML(Extension):
    def extendMarkdown(self, md: Markdown) -> None:
        md.registerExtension(self)
        md.postprocessors.register(
            PrettifyHTMLPostprocessor(), "html_prettify_postprocessor", 15
        )

Python-Markdown / markdown

Pretty HTML output #1470