Alir3z4 / html2text

Convert HTML to Markdown-formatted text.
alir3z4.github.io/html2text/
GNU General Public License v3.0
1.74k stars 266 forks source link

Featurerequest: Output without markdown #381

Open sowinski opened 2 years ago

sowinski commented 2 years ago

Hi,

I found this library becaue I want to to html => text. Unfortunately the library is doing html => markdown.

I haven't seen anything in the docs. Is it possible to disable the markdown output and just get plan text?

Regards Philipp

Cabu commented 1 year ago

Same here. Would be great :)

PanderMusubi commented 10 months ago

Several solutions exist and can be found by searching for markdown2text etc in https://pypi.org/ Perhaps this issue is therefore out of scope (but that is not up to me to decide).

lesnake commented 6 months ago

I was about to write what @sowinski wrote.

A html2text then a markdown2text would not get the job done because the markdown regex remove some useless bytes, escapes characters ... Example

 -      foo
 -      bar
 -      baz

Will be converted into

\- foo
\- bar
\- baz

Which loses the initial number of spaces. Any more processing steps, as I feel like @PanderMusubi suggests, cannot regenerate the information that was lost (the spaces).