aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.61k stars 412 forks source link

Keep <del> and <strike> tag #27

Closed yorkxin closed 12 years ago

yorkxin commented 12 years ago

I was converting my Wordpress posts to Markdown, and I found that all <del> and <strike> tags have been removed (only data are kept).

So here comes a patch that outputs <del> and <strike> tags (but no attributes are kept, since I don't know how to access "tag with angle brackets and attrs").

Would be better if there is an option for whether to keep unknown tags or not.

p.s. I'm not a Python programmer, so I can't help you much further. I only implemented what I need, and feedback this to you.

aaronsw commented 12 years ago

This seems pretty reasonable. Is there a reason you didn't include <ins> as well?

If anyone complaints we can make it more configurable.

yorkxin commented 12 years ago

Oh, I forgot the <ins> tag.

The reason I added <del> and <strike> was because my original Wordpress posts has many <del> and <strike> texts. I wanna keep them, but there is no Markdown syntax for a strike-through text.

I think it should be more configurable, too. But actually I'm not doing Python very well (I speak Ruby), so I think I can't help you very much.