gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 275 forks source link

Unable to decode HTML entities #734

Open asbjornu opened 3 years ago

asbjornu commented 3 years ago

I want to convert HTML to plain text within a Kramdown plugin I'm making and I'm unable to get HTML entities decoded no matter what I do. Here's one of many things I've tried:

html = "<h1>&amp; &gt; &lt;</h1>"
doc = Kramdown::Document.new(html, input: :html, entity_output: :as_char)
puts doc.to_kramdown # Outputs: # &amp; &gt; &lt;

I expected the output to be # & > < and not # &amp; &gt; &lt;. What am I doing wrong here?

gettalong commented 3 years ago

You are doing nothing wrong, this is just how the conversion is done. The entities for < > & " are not converted to characters.

asbjornu commented 3 years ago

Thanks for the quick reply! Would it be possible to change that behavior, somehow?

gettalong commented 3 years ago

I'm open to pull requests that adjust this behaviour in the kramdown converter, the used utility function should not be changed because it is used in several places.

asbjornu commented 3 years ago

Thanks, @gettalong. I may have a look at this when time allows. For now, I've circumvented the issue by using REXML directly.