In the current PyPi html2text converting a single backslash in html produces a single backslash in plain text. That seems right. But converting two backslashes in html produces 3 backslashes in plain text. It seems like two backslashes in html should produce two in plain text. The where I am seeing this is in html that shows two backslashes in Windows some file paths to indicate the backslash is escaped. When we convert in our ChimeraX application to plain text for bug reporting it then appears as 3 backslashes in the file names (https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10252).
Note that in the python strings in the test script below the appearance of two backslashes in a Python string means just one backslash since "\" is an escape indicating a single character string containing one backslash.
Version by html2text --version
2020.1.16
Test script
import html2text
h = html2text.HTML2Text()
h.handle('<p>\\</p>')
'\\\n\n' # Seems right
h.handle('<p>\\\\</p>')
'\n\n\\\\\\\n\n' # Seems wrong, 3 backslashes in the output.
html2text.__version__
(2020, 1, 16)
In the current PyPi html2text converting a single backslash in html produces a single backslash in plain text. That seems right. But converting two backslashes in html produces 3 backslashes in plain text. It seems like two backslashes in html should produce two in plain text. The where I am seeing this is in html that shows two backslashes in Windows some file paths to indicate the backslash is escaped. When we convert in our ChimeraX application to plain text for bug reporting it then appears as 3 backslashes in the file names (https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10252).
Note that in the python strings in the test script below the appearance of two backslashes in a Python string means just one backslash since "\" is an escape indicating a single character string containing one backslash.
Version by
html2text --version
2020.1.16Test script
python --version
Python 3.10.9