Open jpjoines opened 4 years ago
You are never using the parser
with which you defined the body width...you are using directly html2text.html2text(html)
so basically you called the function 3 times with 3 times the same default settings
In [1]: import html2text
In [2]: html2text.__version__
Out[2]: (2020, 1, 16)
In [3]: html = 'This line is longer than seventy-eight characters. It seems to be getting wrapped with backslash n line breaks at seventy-eight characters regardless of the body_width
...: setting in config.py or in the parser.'
In [4]: parser = html2text.HTML2Text()
In [5]: parser.body_width = 0
In [6]: parser.handle(html)
Out[6]: 'This line is longer than seventy-eight characters. It seems to be getting wrapped with backslash n line breaks at seventy-eight characters regardless of the body_width setting in config.py or in the parser.\n'
In [7]: parser = html2text.HTML2Text(bodywidth=0)
In [8]: parser.handle(html)
Out[8]: 'This line is longer than seventy-eight characters. It seems to be getting wrapped with backslash n line breaks at seventy-eight characters regardless of the body_width setting in config.py or in the parser.\n'
# This is getting wrapped as expected
In [9]: html2text.html2text(html)
Out[9]: 'This line is longer than seventy-eight characters. It seems to be getting\nwrapped with backslash n line breaks at seventy-eight characters regardless of\nthe body_width setting in config.py or in the parser.\n\n'
$ cat /tmp/test.html This line is longer than seventy-eight characters. It seems to be getting\nwrapped with backslash n line breaks at seventy-eight characters regardless\nof the _bodywidth setting in config.py or in the parser. $ $ wc -l /tmp/test.html; wc -c /tmp/test.html 1 /tmp/test.html 217 /tmp/test.html $ $ python3.8 -m html2text -b 0 /tmp/test.html This line is longer than seventy-eight characters. It seems to be getting\nwrapped with backslash n line breaks at seventy-eight characters regardless\nof the _bodywidth setting in config.py or in the parser. $ $ echo 'This line is longer than seventy-eight characters. It seems to be getting' | wc -c 78 $ $ python3.8 -m html2text -b 22 /tmp/test.html This line is longer than seventy-eight characters. It seems to be getting\nwrapped with backslash n line breaks at seventy- eight characters regardless\nof the _bodywidth setting in config.py or in the parser.
$ python3.8 -m html2text -b 99 /tmp/test.html This line is longer than seventy-eight characters. It seems to be getting\nwrapped with backslash n line breaks at seventy-eight characters regardless\nof the _bodywidth setting in config.py or in the parser.
$