Closed rhempel closed 3 years ago
We had a report about this in #3564 — it was caused by a broken version of lxml shipped by Conda. Can you try again with a lxml
version from PyPI?
I've followed that #3564 thread and have a little more information to add - sorry for not mentioning it up front. I am running on Windows 10, but I am using the git for windows SDK - so that's MYSY2 Mingw64. The version of libxml2 is libxml2-2.9.12-2 and MSYS2 prefers to NOT downgrade packages but ... https://www.msys2.org/docs/package-management/ After downgrading to http://repo.msys2.org/mingw/mingw64/mingw-w64-x86_64-libxml2-2.9.10-8-any.pkg.tar.zst the multi-post index pages look fine again :-) Thanks for the quick response - closing this issue but I cannot help but think that a significant issue with libxml 2.9.11 and 2.9.12 would have shown up in other use cases. Is it possible that there is still an issue in nikola due to incorrect assumptions in how libxml2 is used?
There is a possibility that the HTML fed to lxml contains unbalanced tags, or some other combination libxml2 dislikes since 2.9.11. Or there’s a regression in libxml2. That could be verified by removing lxml
from the Nikola pipeline and checking the produced code.
Sorry to keep this thread going after the issue is closed, but could you point me at where to remove lxml from the Nikola pipeline? I've poked around in nikola.py and have scanned the internals document but nothing obvious is jumping out at me :-)
There seems to be a LOT of lxml usage scattered in the Nikola source and I'm pretty sure that from your comment there is one RIGHT place to catch the output.
I’d say that commenting out the lxml-related parts in render_template
should do it, and that would be nikola.py, lines 1497-1509.
Thanks for the hint - I did try to comment that section out previously but forgot to set my libxml2 back to the current version. Now that I have that straight, I can confirm the issue sits here: https://github.com/getnikola/nikola/blob/db97e3da73d1ce6fcf4138e938967db8a4efc0c6/nikola/nikola.py#L1501 and https://github.com/getnikola/nikola/blob/db97e3da73d1ce6fcf4138e938967db8a4efc0c6/nikola/nikola.py#L1509
If you comment out those lines (and the preceeding else) and add: data = data.encode('utf-8') Then the HTML is rendered correctly - there seems to be a problem when going from HTML data to XML doc and pretty printing back to HTML data ...
Let me know if there's anything I can do to test/verify - I have a workaround for now so it's not critical
I tried investigating a bit further and found this in the output:
<div class="e-content entry-content">
<body><p>Write your post here.</p></body>
</html>
</div>
There might be something wrong with Post.text()
that causes it to insert unexpected tags into its output, which seemingly confuse new versions of libxml2.
@rhempel I think I got it fixed. Could you also test the bugfix in pull request #3580?
Confirmed fixed when running with libxml2-2.9.12-2
You are awesome!
Environment
Python Version: 3.9.6
Nikola Version: 8.1.3
Operating System: Windows 10
Description:
When INDEX_DISPLAY_POST_COUNT = 2 (or higher) the first post on the index page is correctly inside the div class="body-content"
All following posts are outside that div, and therefore do not inherit the formatting of "body-content"
The most obvious visual clue is that the width of the posts following the first post is the full page width, not the width of the body-content div.
I have changed to the unmodified base theme and the problem is the same - and there is nothing obvious in the index template to indicate a problem.
One reason this may have slipped through is that the demo sites for all current Nikola Themes only has index pages with one item :-)