aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.61k stars 412 forks source link

Google docs #19

Closed nushoin closed 12 years ago

nushoin commented 12 years ago

I have added the option to convert Google Docs documents, which were exported as HTML, to markdown.

Google uses CSS, rather then pure HTML, to differentiate ordered lists from unordered ones. They employ similar trickery for other components as well, probably in order to gain good cross-browser support. However these tricks make it difficult to convert these documents to text.

Here I tried to handle Google's peculiarities. The support is not yet complete, but is already pretty usable. All the additional code is protected under command line options.

Please let me know if there is any problem with the code.

Thanks, Yariv

aaronsw commented 12 years ago

Wow, thanks for the huge patch.