danielfrg / pelican-jupyter

Pelican plugin for blogging with Jupyter/IPython Notebooks
Apache License 2.0
422 stars 105 forks source link

BeautifulSoup invocation erroneously alters the resulting html #56

Open akhmerov opened 8 years ago

akhmerov commented 8 years ago

this bit of code seems to be harmful.

I have observed it producing from this:

line with a line break (so ending with a double space)  
line without a line break

the following erroneous html:

line with a line break (so ending with a double space)<br>
line without a line break</br>

(note the </br> tag). This results in large extra empty space added to a text with a lot of line breaks.

ischurov commented 7 years ago

Agree. Moreover: it makes almost impossible to include HTML code inside code blocks as soup.decode(formatter=None) replaces all entities (including < &rt;) to its corresponding symbols. So if I have something like print("<b>") in the source I get actual <b> tag in the output. And this is unavoidable in general, as BeautulSoup converts entities to corresponding Unicode symbols on parsing and therefore losses some information.

This part of code is used to remove all cells with #ignore text. Personally, I'm willing just to comment it out as this feature is not crucial for me. Nevertheless, I'm not sure how to solve this problem better.

ischurov commented 7 years ago

Most probably it is better not to tweak with HTML tree but to remove the corresponding cells from ipynb JSON file before invoking HTMLExporter.

akhmerov commented 7 years ago

Seems correct, an nbconvert Preprocessor is the way to go. I didn't need the feature, so I just removed it entirely.