getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.62k stars 451 forks source link

Index pages do not respect body-content div after first post #3573

Closed rhempel closed 3 years ago

rhempel commented 3 years ago

Environment

Python Version: 3.9.6

Nikola Version: 8.1.3

Operating System: Windows 10

Description:

When INDEX_DISPLAY_POST_COUNT = 2 (or higher) the first post on the index page is correctly inside the div class="body-content"

All following posts are outside that div, and therefore do not inherit the formatting of "body-content"

The most obvious visual clue is that the width of the posts following the first post is the full page width, not the width of the body-content div.

I have changed to the unmodified base theme and the problem is the same - and there is nothing obvious in the index template to indicate a problem.

One reason this may have slipped through is that the demo sites for all current Nikola Themes only has index pages with one item :-)

Kwpolska commented 3 years ago

We had a report about this in #3564 — it was caused by a broken version of lxml shipped by Conda. Can you try again with a lxml version from PyPI?

rhempel commented 3 years ago

I've followed that #3564 thread and have a little more information to add - sorry for not mentioning it up front. I am running on Windows 10, but I am using the git for windows SDK - so that's MYSY2 Mingw64. The version of libxml2 is libxml2-2.9.12-2 and MSYS2 prefers to NOT downgrade packages but ... https://www.msys2.org/docs/package-management/ After downgrading to http://repo.msys2.org/mingw/mingw64/mingw-w64-x86_64-libxml2-2.9.10-8-any.pkg.tar.zst the multi-post index pages look fine again :-) Thanks for the quick response - closing this issue but I cannot help but think that a significant issue with libxml 2.9.11 and 2.9.12 would have shown up in other use cases. Is it possible that there is still an issue in nikola due to incorrect assumptions in how libxml2 is used?

Kwpolska commented 3 years ago

There is a possibility that the HTML fed to lxml contains unbalanced tags, or some other combination libxml2 dislikes since 2.9.11. Or there’s a regression in libxml2. That could be verified by removing lxml from the Nikola pipeline and checking the produced code.

rhempel commented 3 years ago

Sorry to keep this thread going after the issue is closed, but could you point me at where to remove lxml from the Nikola pipeline? I've poked around in nikola.py and have scanned the internals document but nothing obvious is jumping out at me :-)

There seems to be a LOT of lxml usage scattered in the Nikola source and I'm pretty sure that from your comment there is one RIGHT place to catch the output.

Kwpolska commented 3 years ago

I’d say that commenting out the lxml-related parts in render_template should do it, and that would be nikola.py, lines 1497-1509.

https://github.com/getnikola/nikola/blob/db97e3da73d1ce6fcf4138e938967db8a4efc0c6/nikola/nikola.py#L1497-L1509

rhempel commented 3 years ago

Thanks for the hint - I did try to comment that section out previously but forgot to set my libxml2 back to the current version. Now that I have that straight, I can confirm the issue sits here: https://github.com/getnikola/nikola/blob/db97e3da73d1ce6fcf4138e938967db8a4efc0c6/nikola/nikola.py#L1501 and https://github.com/getnikola/nikola/blob/db97e3da73d1ce6fcf4138e938967db8a4efc0c6/nikola/nikola.py#L1509

If you comment out those lines (and the preceeding else) and add: data = data.encode('utf-8') Then the HTML is rendered correctly - there seems to be a problem when going from HTML data to XML doc and pretty printing back to HTML data ...

rhempel commented 3 years ago

Let me know if there's anything I can do to test/verify - I have a workaround for now so it's not critical

Kwpolska commented 3 years ago

I tried investigating a bit further and found this in the output:


                <div class="e-content entry-content">
                    <body><p>Write your post here.</p></body>
</html>

                </div>

There might be something wrong with Post.text() that causes it to insert unexpected tags into its output, which seemingly confuse new versions of libxml2.

Kwpolska commented 3 years ago

@rhempel I think I got it fixed. Could you also test the bugfix in pull request #3580?

rhempel commented 3 years ago

Confirmed fixed when running with libxml2-2.9.12-2

You are awesome!