aaronsw / html2text

Convert HTML to Markdown-formatted text.
http://www.aaronsw.com/2002/html2text/
GNU General Public License v3.0
2.65k stars 414 forks source link

Extra "\" slashes before specific numeric #133

Open SubhamDyno opened 1 year ago

SubhamDyno commented 1 year ago

Hello Team,

Whenever we give Input: <p>1. Hello My name is Subham</p> to this html2text. output: 1\. Hello My name is Subham

The extra "\" after numeric digit is not needed. This is very specific appearing after numerics whenever there is a "." (dot) and whitespace following to it.

Could you please help to escape this.

rajkumar-jangid-macmillan commented 5 months ago

The Issue happens at utils.py package file (Python37\Lib\site-packages\html2text\utils.py) at lines 210. Here are those lines that work: text = config.RE_MD_DOT_MATCHER.sub(r"\1\2", text)

These lines originally have 2 extra backslashes, just replacing this one lines should fix this issue. Not sure if it could break something else.