getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.62k stars 450 forks source link

"lxml.etree.ParserError: Document is empty" with demo content #3663

Closed jkseppan closed 1 year ago

jkseppan commented 1 year ago

Environment

I have installed Nikola with pipx install "Nikola[extras]".

Python Version: 3.11.1

Nikola Version: v8.2.3

Operating System: macOS 13.1 (22C65)

Description:

I'm trying to follow the Getting Started documentation. When I create a demo site and run nikola build, I get the error "lxml.etree.ParserError: Document is empty". Here is the full output:

/tmp % nikola init -d -q testsite
[2023-01-14 21:45:47] INFO: init: A new site with example data has been created at testsite.
[2023-01-14 21:45:47] INFO: init: See README.txt in that folder for more information.
/tmp % cd testsite
testsite % nikola build
Scanning posts........done!
.  scale_images:output/images/frontispiece.jpg
.  scale_images:output/images/illus_001.jpg
.  render_taxonomies:output/archive.html
.  render_taxonomies:output/categories/index.html
.  render_galleries:output/galleries
.  render_galleries:output/galleries/demo
.  render_galleries:output/galleries/index.html
TaskError - taskid:render_galleries:output/galleries/index.html
PythonAction Error
Traceback (most recent call last):
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/doit/action.py", line 461, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/nikola/plugins/task/galleries.py", line 717, in render_gallery_index
    self.site.render_template(template_name, output_name, context)
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/nikola/nikola.py", line 1509, in render_template
    doc = lxml.html.document_fromstring(data.strip(), parser)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

########################################
render_galleries:output/galleries/index.html <stdout>:

Possibly related issues: #2851, #3507

Kwpolska commented 1 year ago

That seems unusual.

  1. If you remove the demo gallery, does it work?
  2. If you look into the templates (nikola/data/themes/*/templates), are there any blank files or broken symlinks?
jkseppan commented 1 year ago

I edited conf.py to include

 GALLERY_FOLDERS = {}

and now it gets further but still shows a similar error related to listings:

testsite % nikola build
Scanning posts........done!
.  render_posts:timeline_changes
.  render_posts:cache/pages/dr-nikolas-vendetta.html
.  render_posts:cache/pages/path_handlers.html
.  render_posts:cache/pages/creating-a-theme.html
.  render_posts:cache/pages/charts.html
.  render_posts:cache/pages/social_buttons.html
.  render_posts:cache/pages/listings-demo.html
.  render_posts:cache/pages/1.html
.  render_posts:cache/pages/bootstrap-demo.html
.  render_posts:cache/pages/extending.html
.  render_posts:cache/pages/internals.html
.  render_posts:cache/pages/manual.html
.  render_posts:cache/pages/quickref.html
.  render_posts:cache/pages/quickstart.html
.  render_posts:cache/posts/1.html
.  render_posts:cache/pages/theming.html
.  copy_assets:output/assets/css/bootblog.css
.  copy_assets:output/assets/css/bootstrap.min.css
.  copy_assets:output/assets/css/theme.css
.  copy_assets:output/assets/js/jquery.min.js
.  copy_assets:output/assets/js/bootstrap.min.js
.  copy_assets:output/assets/js/popper.min.js
.  copy_assets:output/assets/css/nikola_rst.css
.  copy_assets:output/assets/css/nikola_ipython.css
.  copy_assets:output/assets/css/html4css1.css
.  copy_assets:output/assets/css/rst.css
.  copy_assets:output/assets/css/ipython.min.css
.  copy_assets:output/assets/css/rst_base.css
.  copy_assets:output/assets/css/baguetteBox.min.css
.  copy_assets:output/assets/js/html5.js
.  copy_assets:output/assets/js/fancydates.js
.  copy_assets:output/assets/js/gallery.min.js
.  copy_assets:output/assets/js/fancydates.min.js
.  copy_assets:output/assets/js/gallery.js
.  copy_assets:output/assets/js/baguetteBox.min.js
.  copy_assets:output/assets/js/html5shiv-printshiv.min.js
.  copy_assets:output/assets/js/justified-layout.min.js
.  copy_assets:output/assets/js/luxon.min.js
.  copy_assets:output/assets/xml/atom.xsl
.  copy_assets:output/assets/xml/rss.xsl
.  copy_assets:output/assets/css/code.css
.  render_listings:output/listings/index.html
TaskError - taskid:render_listings:output/listings/index.html
PythonAction Error
Traceback (most recent call last):
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/doit/action.py", line 461, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/nikola/plugins/task/listings.py", line 178, in render_listing
    self.site.render_template('listing.tmpl', out_name, context)
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/nikola/nikola.py", line 1509, in render_template
    doc = lxml.html.document_fromstring(data.strip(), parser)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

########################################
render_listings:output/listings/index.html <stdout>:

I don't see any blank files or broken links:

testsite % (cd /Users/jks/.local/pipx/venvs/nikola/lib/python3.11/site-packages/nikola/data/themes; for x in */templates/*; do file $x; done)
base-jinja/templates/annotation_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/archive.tmpl: ASCII text
base-jinja/templates/archive_navigation_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/archiveindex.tmpl: ASCII text
base-jinja/templates/author.tmpl: HTML document text, ASCII text
base-jinja/templates/authorindex.tmpl: ASCII text
base-jinja/templates/authors.tmpl: ASCII text
base-jinja/templates/base.tmpl: HTML document text, ASCII text
base-jinja/templates/base_footer.tmpl: ASCII text
base-jinja/templates/base_header.tmpl: HTML document text, ASCII text
base-jinja/templates/base_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper.tmpl: ASCII text
base-jinja/templates/comments_helper_commento.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_disqus.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_dummy.tmpl: ASCII text
base-jinja/templates/comments_helper_facebook.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_intensedebate.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_isso.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_muut.tmpl: HTML document text, ASCII text
base-jinja/templates/comments_helper_utterances.tmpl: HTML document text, ASCII text
base-jinja/templates/feeds_translations_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/gallery.tmpl: HTML document text, Unicode text, UTF-8 text
base-jinja/templates/index.tmpl: HTML document text, ASCII text
base-jinja/templates/index_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/list.tmpl: HTML document text, ASCII text
base-jinja/templates/list_post.tmpl: HTML document text, ASCII text
base-jinja/templates/listing.tmpl: HTML document text, ASCII text
base-jinja/templates/math_helper.tmpl: LaTeX document text, ASCII text
base-jinja/templates/page.tmpl: ASCII text
base-jinja/templates/pagination_helper.tmpl: HTML document text, Unicode text, UTF-8 text
base-jinja/templates/post.tmpl: ASCII text
base-jinja/templates/post_header.tmpl: HTML document text, ASCII text
base-jinja/templates/post_helper.tmpl: HTML document text, ASCII text
base-jinja/templates/post_list_directive.tmpl: HTML document text, ASCII text
base-jinja/templates/story.tmpl: ASCII text
base-jinja/templates/tag.tmpl: HTML document text, ASCII text
base-jinja/templates/tagindex.tmpl: HTML document text, ASCII text
base-jinja/templates/tags.tmpl: ASCII text
base-jinja/templates/ui_helper.tmpl: HTML document text, ASCII text
base/templates/annotation_helper.tmpl: HTML document text, ASCII text
base/templates/archive.tmpl: ASCII text
base/templates/archive_navigation_helper.tmpl: HTML document text, ASCII text
base/templates/archiveindex.tmpl: ASCII text
base/templates/author.tmpl: HTML document text, ASCII text
base/templates/authorindex.tmpl: ASCII text
base/templates/authors.tmpl: ASCII text
base/templates/base.tmpl: HTML document text, ASCII text
base/templates/base_footer.tmpl: ASCII text
base/templates/base_header.tmpl: HTML document text, ASCII text
base/templates/base_helper.tmpl: HTML document text, ASCII text
base/templates/comments_helper.tmpl: ASCII text
base/templates/comments_helper_commento.tmpl: HTML document text, ASCII text
base/templates/comments_helper_disqus.tmpl: HTML document text, ASCII text
base/templates/comments_helper_dummy.tmpl: ASCII text
base/templates/comments_helper_facebook.tmpl: HTML document text, ASCII text
base/templates/comments_helper_intensedebate.tmpl: HTML document text, ASCII text
base/templates/comments_helper_isso.tmpl: HTML document text, ASCII text
base/templates/comments_helper_muut.tmpl: HTML document text, ASCII text
base/templates/comments_helper_utterances.tmpl: HTML document text, ASCII text
base/templates/feeds_translations_helper.tmpl: HTML document text, ASCII text
base/templates/gallery.tmpl: HTML document text, Unicode text, UTF-8 text
base/templates/index.tmpl: HTML document text, ASCII text
base/templates/index_helper.tmpl: HTML document text, ASCII text
base/templates/list.tmpl: HTML document text, ASCII text
base/templates/list_post.tmpl: HTML document text, ASCII text
base/templates/listing.tmpl: HTML document text, ASCII text
base/templates/math_helper.tmpl: LaTeX document text, ASCII text
base/templates/page.tmpl: ASCII text
base/templates/pagination_helper.tmpl: HTML document text, Unicode text, UTF-8 text
base/templates/post.tmpl: ASCII text
base/templates/post_header.tmpl: HTML document text, ASCII text
base/templates/post_helper.tmpl: HTML document text, ASCII text
base/templates/post_list_directive.tmpl: HTML document text, ASCII text
base/templates/story.tmpl: ASCII text
base/templates/tag.tmpl: HTML document text, ASCII text
base/templates/tagindex.tmpl: HTML document text, ASCII text
base/templates/tags.tmpl: ASCII text
base/templates/ui_helper.tmpl: HTML document text, ASCII text
bootblog4-jinja/templates/base.tmpl: HTML document text, ASCII text
bootblog4-jinja/templates/base_helper.tmpl: HTML document text, ASCII text
bootblog4-jinja/templates/index.tmpl: HTML document text, ASCII text
bootblog4/templates/base.tmpl: HTML document text, ASCII text
bootblog4/templates/base_helper.tmpl: HTML document text, ASCII text
bootblog4/templates/index.tmpl: HTML document text, ASCII text
bootstrap4-jinja/templates/authors.tmpl: ASCII text
bootstrap4-jinja/templates/base.tmpl: HTML document text, ASCII text
bootstrap4-jinja/templates/base_helper.tmpl: HTML document text, ASCII text
bootstrap4-jinja/templates/index_helper.tmpl: HTML document text, ASCII text
bootstrap4-jinja/templates/listing.tmpl: HTML document text, Unicode text, UTF-8 text
bootstrap4-jinja/templates/pagination_helper.tmpl: HTML document text, Unicode text, UTF-8 text
bootstrap4-jinja/templates/post.tmpl: ASCII text
bootstrap4-jinja/templates/tags.tmpl: ASCII text
bootstrap4-jinja/templates/ui_helper.tmpl: HTML document text, ASCII text
bootstrap4/templates/authors.tmpl: ASCII text
bootstrap4/templates/base.tmpl: HTML document text, ASCII text
bootstrap4/templates/base_helper.tmpl: HTML document text, ASCII text
bootstrap4/templates/index_helper.tmpl: HTML document text, ASCII text
bootstrap4/templates/listing.tmpl: HTML document text, Unicode text, UTF-8 text
bootstrap4/templates/pagination_helper.tmpl: HTML document text, Unicode text, UTF-8 text
bootstrap4/templates/post.tmpl: ASCII text
bootstrap4/templates/tags.tmpl: ASCII text
bootstrap4/templates/ui_helper.tmpl: HTML document text, ASCII text

(The file command outputs foo: empty or foo: broken symbolic link to /foobar for blank files or broken symlinks.)

Kwpolska commented 1 year ago

The task that fails now also involves templates. The file output looks correct at first glance.

Here are a few things you can try to debug this:

(I tested with a clean install of Nikola from pip, on Linux, without pipx, but that shouldn’t matter. The Linux part might, if it’s a bug in macOS builds of packages)

jkseppan commented 1 year ago

Running this in pdb, the problem seems to be in lxml.etree. The etree.fromstring method returns None but the input looks just fine to me, and is definitely not an empty string. This is inside lxml:

(Pdb) list
757
758     def document_fromstring(html, parser=None, ensure_head_body=False, **kw):
759         if parser is None:
760             parser = html_parser
761         value = etree.fromstring(html, parser, **kw)
762  ->     if value is None:
763             raise etree.ParserError(
764                 "Document is empty")
765         if ensure_head_body and value.find('head') is None:
766             value.insert(0, Element('head'))
767         if ensure_head_body and value.find('body') is None:
(Pdb) p value
None
(Pdb) p html
'<!DOCTYPE html>\n<html     prefix=\'        og: http://ogp.me/ns# article: http://ogp.me/ns/article#     \'     vocab="http://ogp.me/ns" lang="en">\n<head>\n    <meta charset="utf-8">\n    <meta name="viewport" content="width=device-width">\n        <title>galleries | Demo Site</title>\n\n    \n            <link href="/assets/css/all-nocdn.css" rel="stylesheet" type="text/css">\n\n    <meta name="theme-color" content="#5670d4">\n        <meta name="generator" content="Nikola (getnikola.com)">\n    \n        \n                    \n        <link rel="alternate" type="application/rss+xml" title="RSS" hreflang="en" href="/rss.xml">\n\n\n        \n\n\n    <link rel="canonical" href="https://example.com/galleries/">\n\n\n\n\n        <!--[if lt IE 9]><script src="../assets/js/html5shiv-printshiv.min.js"></script><![endif]-->\n\n    \n\n\n\n\n<link rel="alternate" type="application/rss+xml" title="RSS" href="rss.xml">\n<style type="text/css">\n    #gallery_container {\n        position: relative;\n    }\n    .image-block {\n        position: absolute;\n    }\n</style>\n<link rel="alternate" type="application/rss+xml" title="RSS" href="rss.xml">\n\n\n</head>\n<body>\n    <a href="#content" class="sr-only sr-only-focusable">Skip to main content</a>\n    <div id="container">\n        \n    <header id="header">\n        \n    <h1 id="brand"><a href="/" title="Demo Site" rel="home">\n\n        <span id="blog-title">Demo Site</span>\n    </a></h1>\n\n        \n\n        \n    <nav id="menu">\n    <ul>\n    \n                <li><a href="/archive.html">Archives</a></li>\n                <li><a href="/categories/index.html">Tags</a></li>\n                <li><a href="/rss.xml">RSS feed</a></li>\n\n    \n\n    \n    \n    </ul>\n    </nav>\n\n    </header>\n    \n\n        <main id="content">\n            \n    \n<nav class="breadcrumbs">\n<ul class="breadcrumb">\n                <li>galleries</li>\n</ul>\n</nav>\n\n    <h1>galleries</h1>\n            <ul>\n                <li><a href="demo/">📂&nbsp;Nikola Tesla</a></li>\n            </ul>\n\n<div id="gallery_container"></div>\n\n        </main>\n        \n        <footer id="footer">\n            <p>Contents &copy; 2023         <a href="mailto:joe@demo.site">Your Name</a> - Powered by         <a href="https://getnikola.com" rel="nofollow">Nikola</a>         </p>\n            \n        </footer>\n\n    </div>\n    \n            <script src="/assets/js/all-nocdn.js"></script>\n    \n\n    \n<script src="/assets/js/justified-layout.min.js"></script>\n<script src="/assets/js/gallery.min.js"></script>\n<script>\nvar jsonContent = [];\nvar thumbnailSize = 180;\nrenderGallery(jsonContent, thumbnailSize);\nwindow.addEventListener(\'resize\', function(){renderGallery(jsonContent, thumbnailSize)});\n</script>\n\n    <script>\n    baguetteBox.run(\'div#content\', {\n        ignoreClass: \'islink\',\n        captions: function(element){var i=element.getElementsByTagName(\'img\')[0];return i===undefined?\'\':i.alt;}});\n    </script>\n    \n    \n</body>\n</html>'
(Pdb) p value
None
(Pdb) p etree.fromstring(html[:1922], parser)
<Element html at 0x1087e55e0>
(Pdb) p etree.fromstring(html[:1923], parser)
None
(Pdb) p html[1922]
'📂'

So it seems that etree.fromstring has trouble with the folder icon.

jkseppan commented 1 year ago

If I edit base/templates/gallery.tmpl (I switched to the base template, as you suggested as one debugging step) and replace 📂 with &#x1f4c2; there is no error and I still see the folder icon in the output.

Thanks for the assistance! I'll go file an lxml bug.

Kwpolska commented 1 year ago

Thanks for debugging this! I replaced the emoji characters with hex escapes in our templates to mitigate this.