As a result of #109, character and entity references are unconditionally dereferenced. This causes HTML which contains character references representing HTML-like text to be converted to markdown with raw HTML by html2text 2017.10.4 and later:
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown
Horizontal rule is <hr>
To make the problem clearer, consider round-tripping from HTML to Markdown back to HTML:
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown | cmark
<p>Horizontal rule is <!-- raw HTML omitted --></p>
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown | cmark --unsafe
<p>Horizontal rule is <hr></p>
The conversion to markdown changes the meaning of the content by dereferencing the character references.
To satisfy the request in #109, I suggest preserving character and entity references which would be interpreted as Raw HTML if dereferenced. That would avoid producing unnecessary character references (as requested in #109) and also avoid changing the meaning of the content when it contains HTML-like text.
As a result of #109, character and entity references are unconditionally dereferenced. This causes HTML which contains character references representing HTML-like text to be converted to markdown with raw HTML by html2text 2017.10.4 and later:
To make the problem clearer, consider round-tripping from HTML to Markdown back to HTML:
The conversion to markdown changes the meaning of the content by dereferencing the character references.
To satisfy the request in #109, I suggest preserving character and entity references which would be interpreted as Raw HTML if dereferenced. That would avoid producing unnecessary character references (as requested in #109) and also avoid changing the meaning of the content when it contains HTML-like text.
Thanks for considering, Kevin