Open lukasrad02 opened 6 months ago
Is this due to newlines being added? What is the HTML output?
There are no newline added, just spaces.
The HTML passed to html2text
(see https://github.com/fsr-de/myHPI/blob/72588358ea069005922a8b3dd08dffca0ac34db5/myhpi/core/markdown/fields.py#L33) is exactly identical to the html entered to the translation editor.
I think (but haven't verified this yet) that html2text parses the whole HTML input into some AST-like structure that does not preserve formatting and uses some generic formatting rules when rewriting it as markdown, thus adding the spaces.
Is it viable to switch from html2text
to a library that translates the source directly as Markdown? @jeriox Some considerations for that:
<a>
tags.@dropforge I think it would be feasible, and given how much problems the HTML representation already caused I think it would be a good way forward. Back when we implemented the prototype/MVP it worked good enough, so we decided to go with it as it was quicker. If you are willing to do a deepdive on that I'd highly appreciate it!
When using inline formatting that is not surrounded by spaces, e.g.
H<strong>e</strong>llo
, in a translation, surrounding spaces will be added automatically when the content is converted back to markdown.Translation editor:
Rendered Page: