aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.63k stars 414 forks source link

Non-breaking spaces processing #43

Closed dreikanter closed 12 years ago

dreikanter commented 12 years ago

Hi Aaron, I've fixed a bug with   placeholder replacement and added a new option to keep   if required (could be helpful for non-google-docs input). Also I've updated the test script to make it work correctly on Windows environment. All tests passed.

aaronsw commented 12 years ago

Can you add a test to make sure this works properly with unicode_snob?

dreikanter commented 12 years ago

I've updated the test as you asked (it passed).

Also I've examined unicode_snob parameter more accurately and found out that my original idea behind keeping   in the output is not relevant. I supposed html2text always converts   to ordinary space, but actually it could be transformed to non-breakable unicode space which is an suitable solution for me. So I've rolled back the --keep-nbsp parameter. Summarizing all the above, my patch includes  _place_holder; fix and as I mentioned the updated test.

On Thu, Jul 19, 2012 at 5:53 PM, aaronsw < reply@reply.github.com

wrote:

Can you add a test to make sure this works properly with unicode_snob?


Reply to this email directly or view it on GitHub: https://github.com/aaronsw/html2text/pull/43#issuecomment-7098677

aaronsw commented 12 years ago

Thanks so much for the patch and putting up with my nitpicking!

dreikanter commented 12 years ago

You're welcome :)

On Fri, Jul 20, 2012 at 6:52 PM, aaronsw < reply@reply.github.com

wrote:

Thanks so much for the patch and putting up with my nitpicking!


Reply to this email directly or view it on GitHub: https://github.com/aaronsw/html2text/pull/43#issuecomment-7132137