executablebooks / markdown-it-py

Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed. Now in Python!
https://markdown-it-py.readthedocs.io
MIT License
702 stars 68 forks source link

Option to skip adding `<pre><code>` to highlighted code #256

Open andersk opened 1 year ago

andersk commented 1 year ago

Context

When using the highlight option to provide a custom syntax highlighter, markdown-it-py wraps the HTML output of the highlighter in <pre><code> unless it already starts with <pre:

https://github.com/executablebooks/markdown-it-py/blob/73a01479212bfe2aea0b995b4d13c8ddca2e4265/markdown_it/renderer.py#L270

But that heuristic fails for pygments.highlight, whose output does not begin with <pre:

>>> pygments.highlight('print("hello")', pygments.lexers.get_lexer_by_name("python"), pygments.formatters.HtmlFormatter())
'<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">&quot;hello&quot;</span><span class="p">)</span>\n</pre></div>\n'

So markdown-it-py turns this into <pre><code><div class="highlight"><pre>…</pre></div></code></pre>, and existing CSS themes for Pygments need to be rewritten to account for the unnecessarily duplicated <pre>.

Proposal

Can we have an option to skip adding <pre><code> that’s not subject to the heuristic?

Tasks and updates

No response

welcome[bot] commented 1 year ago

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

pydsigner commented 1 year ago

An alternative could be to subclass RendererHTML and override fence():

class CustomRendererHTML(RendererHTML):
    def fence(self, tokens: Sequence[Token], idx: int, options: OptionsDict, env: EnvType) -> str:
        token = tokens[idx]
        info = unescapeAll(token.info).strip() if token.info else ''
        langName = info.split(maxsplit=1)[0] if info else ''

        if options.highlight:
            return options.highlight(
                token.content, langName, ''
            ) or escapeHtml(token.content)

        return escapeHtml(token.content)

Then this class can be selected to get the desired behavior: markdown_it.MarkdownIt(renderer_cls=CustomRendererHTML)

ZeroAurora commented 10 months ago

Upvoting this for my meeting this problem. Also, "highlight" is something that is almost undocumented. Maybe it needs more attention.

dimitrilarue commented 2 months ago

Just implemented this, it seems ok:

pygments_style = get_style_by_name('catppuccin-mocha')

def highlight_func(code: str, lang: str, _) -> str | None:
    """Highlight function using pygments."""
    if not lang:
        return None

    lexer = get_lexer_by_name(lang)
    formatter = HtmlFormatter(style=pygments_style, noclasses=True, nowrap=True)
    return highlight(code, lexer, formatter)

md = MarkdownIt('js-default', {'highlight': highlight_func}).enable('table')

nowrap=True tells pygments to not add any div, pre. So it let's markdown-it do it.

andersk commented 2 months ago

@dimitrilarue No, that’s the opposite of what I need. The extra classes from Pygments such as <div class="highlight"> are important for styling, so I need to be able to tell markdown-it-py not to generate its own wrappers.