aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.58k stars 410 forks source link

<script> and <iframe> tags should be returned as-is #45

Open Quantisan opened 11 years ago

bitboxer commented 11 years ago

:+1: for this one.

mcepl commented 10 years ago

-1 from me ... html2text IMHO should be kept to the minimum. If you need anything more complicated, go and pre-/post-process its input/output.

bitboxer commented 10 years ago

The problem is that if you want to convert HTML from Wordpress to a Jekyll Markdown, you want to preserve script and iframe tags. They will be lost afterwards. You could create a parser that replaces them by a marker string and replace that marker string after the conversion, but it would be way nicer if this lib has an option for this. And less error prone.

mcepl commented 10 years ago

What in the world is the point of storing iframes in Jekyll? Anyway, some escaping of HTML elements ('<' => <) should be sufficient shouldn't it? That's what I meant as pre-/post-processing.

bitboxer commented 10 years ago

What is the point? Maybe I just want to preserve youtube iframes when converting my blog :wink: . Escape the HTML elements is really bad and is very error prone. Why do all this ugyl workarounds when html2text can do this easily.

Alir3z4 commented 10 years ago

Currently html2text does everything in one place, I guess @mcepl is right about pre-/post-processing. We need to implement such a functionality to enable other control that behavior and do what ever they want to without touching html2text directly and make the stuff dirty.

Of course we can pass any tag to prevent removing them and have an option on html2text but all these stuff would make it ugly as possible.

After all my -1 vote for this issue.