aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.61k stars 412 forks source link

Optparse #3

Closed stefanor closed 13 years ago

stefanor commented 13 years ago

Provide help via optparse, and check arguments for validity.

I'm adding this patch in Debian, as we are providing a "html2markdown" program in my html2text package. I wanted -h to help, and it seems sensible to take advantage of optparse.

BTW: The main() section is rather naïve about encodings. With python2, it will throw an exception with non-ASCII characters.

aaronsw commented 13 years ago

What do you think about allowing an optional encoding for URLs too? It would override what's in the headers.

Also, I don't understand your "BTW" comment. Can you file an example as a separate issue?

stefanor commented 13 years ago

Re encoding, sounds decent.

Also, what about running chardet on local files, if no encoding is provided?

As to the BTW, filed.

aaronsw commented 13 years ago

That sounds like a great plan, obviously with a fallback if chardet can't be found.

stefanor commented 13 years ago

Should I move encoding to an option (-e / --encoding) or do you want to keep it as a second argument?

aaronsw commented 13 years ago

I'd prefer it as a second argument.

stefanor commented 13 years ago

Right, user-specifiable encoding for files and URLs. Fallback to chardet.

aaronsw commented 13 years ago

Merged.