Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.81k stars 273 forks source link

<abbr> #176

Open jonathan-s opened 7 years ago

jonathan-s commented 7 years ago

html2text generates markdown for <abbr> tags. But to the best of my knowledge there is no official support for markdown for <abbr>. I tried the markdown generated by html2text in stackedit and the markdown won't render there.

I would suggest that generating markdown for <abbr> is turned off by default. But you could turn it on with a setting. How does that sound @Alir3z4 ?

Alir3z4 commented 7 years ago

So the abbr is not officially supported by Markdown and if the text contains such element, the markdown won't render it.

Html2text is not trying to convert HTML to Markdown exactly, instead it claims:

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

That means the converted text operationally should be able to get back to html as well. I don't think turning off <abbr> by default would be a good idea, because the converted text would lose its format when gets converted back to HTML, instead it would be great to provide a better support for it.

Python Markdown library has an extension that support abbreviation https://pythonhosted.org/Markdown/extensions/abbreviations.html which looks great, also https://github.com/markdown-it/markdown-it-abbr and https://michelf.ca/projects/php-markdown/extra/#abbr

As long as abbr won't make the rendered text on some editors go crazy like (stackedit) I think it's fine to keep it by default, because they won't support and render it. However when it's becoming a problem in the converted text (which makes the converted text ugly and non-markdown looking) then it would be good to turn it off by an an option and not by default.

What do you think @jonathan-s ?

jonathan-s commented 7 years ago

Fair enough :). But I think there should at least be an option to turn off the default markdown generation of <abbr> if you're unable to support the option.

Alir3z4 commented 7 years ago

Correct, I agree with turning it off by an option and not by default.

On Aug 12, 2017 23:09, "Jonathan Sundqvist" notifications@github.com wrote:

Fair enough :). But I think there should at least be an option to turn off the default markdown generation of if you're unable to support the option.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Alir3z4/html2text/issues/176#issuecomment-322000119, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkFCTG9TSAQg6ruptHKZQVEOWsIfj14ks5sXfiAgaJpZM4Oztal .