danielfrg / pelican-jupyter

Pelican plugin for blogging with Jupyter/IPython Notebooks
Apache License 2.0
422 stars 105 forks source link

Generate valid HTML for HTML-based strings. #82

Closed leemengtw closed 6 years ago

leemengtw commented 6 years ago

Fix invalid HTML output when using BeautifulSoup to decode content.

Consider this input cell:

html = """<div><table border="1" class="dataframe"><thead><tr style="text-align:right;"><th></th><th>x</th><th>y</th></tr></thead><tbody><tr><th>0</th><td>-2.863752</td><td>-1.066424</td></tr><tr><th>1</th><td>-0.779238</td><td>0.862169</td></tr></tbody></table></div>"""

When decode using None formatter with BeautifulSoup

soup = BeautifulSoup(content, 'html.parser')
content = soup.decode(formatter=None)

The HTML value inside string is parsed as table in static file, which is not desired as we want a code block here. invalid_html

By using soup.decode(formatter="minimal") which suggested by BeautifulSoup, we get following result:

valid_html

danielfrg commented 6 years ago

Thanks for the PR!