baynezy / Html2Markdown

A library for converting HTML to markdown syntax in C#
Apache License 2.0
273 stars 51 forks source link

Support Syntax Highlighting for code blocks #331

Closed YoussefAzaroual closed 9 months ago

YoussefAzaroual commented 9 months ago

Report issue

When converting an HTML <code> tag to a markdown code block if a language is specified to enable syntax highlighting, the language is returned to line instead and becomes part of the code block.

Expected behavior

Providing the following Html input :

The resulting code should look similar to this:
   <code>HTML
   &lt;html&gt;
    &lt;body&gt;
     ...
    &lt;/body&gt;
   &lt;/html&gt;
   </code>

Should return :

The resulting code should look similar to this:
```HTML
   <html>
    <body>
     ...
    </body>
   </html>

## Actual Behaviour

But instead returns : 

The resulting code should look similar to this:

HTML
   <html>
    <body>
     ...
    </body>
   </html>

## Steps to reproduce the problem

Code sample : 
```csharp
string html =  "The resulting code should look similar to this:\n   <code>HTML\n   &lt;html&gt;\n    &lt;body&gt;\n     ...\n    &lt;/body&gt;\n   &lt;/html&gt;\n   </code>";
string markdown = new Html2Markdown.Converter().Convert(html);
Console.WriteLine(markdown); 

It seems that it's caused by this line HtmlParser.cs#L173, the whole code inner Html is enclosed in triple backticks + a new line.

Do you think this markdown feature could be supported by Html2Markdown ?

Thanks