Closed masci closed 1 year ago
Hey @masci, thanks for the bug report. Dang, it seems I didn't test this sufficiently and trusted StackOverflow a bit too much 👀
The encode/decode code here was introduced to convert a Python literal string into an actual string as it would be parsed by the Python interpreter to memory (so when you write "foo\n"
into your docstring, would actually be "foo\n"
in the Docstring.content
instead of "foo\\n"
)
Unless there's another better working solution using the encode/decode logic, I suppose we need to manually parse the string and convert special character sequences.
Thanks for following up! I'm not sure I get 100% the logic of the answer in SO but at some point I see
...
s.encode('latin1') # To bytes, required by 'unicode-escape'
...
and I wonder, if the goal of that step is just to have bytes out of the original string, can't we just encode using something more flexible than latin1
, like utf-8
? Am I missing something?
The reason is that latin1
and unicode_escape
seem to have a convenient overlap in escape character use, or something like that. But if latin1
can't encode everything, then it's no use either. 🤦
>>> 'ü'.encode('latin1')
b'\xfc'
>>> 'ü'.encode('latin1').decode('unicode_escape')
'ü'
>>> 'ü'.encode('utf-8')
b'\xc3\xbc'
>>> 'ü'.encode('utf-8').decode('unicode_escape')
'ü'
It seems like you already found the PR and thus the StackOverflow answer I was referring to, but for reference: #83 and https://stackoverflow.com/a/58829514/791713
The best alternative that I can think of without re-implementing the decoding of raw strings is to use ast.literal_eval()
. Actually that does appear rather elegant to me, in particular because the string we're dealing with will have the quotes around it.
if s:
s = ast.literal_eval(s)
return Docstring(location, dedent_docstring(s).strip())
In 2.1.2
Describe the bug
When a docstrings contains non-ascii character the conversion fails
To Reproduce Steps to reproduce the behavior:
create a Python file
foo.py
containing the following:from the same folder, run
pydoc-markdown -I . -m foo
see the error:
Expected behavior No errors like it was with version<2.1.0