getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.58k stars 443 forks source link

Sitemap only includes HTML files with doctype. #1456

Closed kayhayen closed 9 years ago

kayhayen commented 9 years ago

Hello,

rm -rf output/* /opt/nikola/bin/nikola build output/assets/js/tag_cloud_data.json && /opt/nikola/bin/nikola build

leaves me with this:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

Sometimes single URLs have been observed, in that case only the 2010 archive page. This is apparently broken as there are many more pages of course.

/opt/nikola/bin/nikola --version Nikola v7.1.0

Also, what is sitemapindex for, should I use that in addition or instead of sitemap.

Yours, Kay

da2x commented 9 years ago

Not sure what is going on with your sitemap. When you build, do you see any sitemap tasks listed?

The sitemap index is the file you submit to the search engines. It contains all your sitemaps (Sitemaps and RSS). It should be on the first line of your /robots.txt file.

kayhayen commented 9 years ago

Yes, I do:

. sitemap:output/sitemap.xml

With an empty, newly created site, it apparently works. It also only fails spuriously. Could this be affected by hash randomization?

Yours, Kay

kayhayen commented 9 years ago

This is output from a failed run, ignore the first rm, that's what I use, to have it only rebuild one page, I did this:

[/data/home/hayen/repos/nikola-site]> rm -rf output/* [/data/home/hayen/repos/nikola-site]> rm output/posts/starting-to-blog-who-and-why.html; /opt/nikola/bin/nikola build output/assets/js/tag_cloud_data.json && /opt/nikola/bin/nikola build && /opt/nikola/bin/nikola serve rm: das Entfernen von „output/posts/starting-to-blog-who-and-why.html“ ist nicht möglich: Datei oder Verzeichnis nicht gefunden Scanning posts.....done! . render_tags:output/assets/js/tag_cloud_data.json Scanning posts.....done! . render_archive:output/2014/index.html . render_archive:output/2011/index.html . render_archive:output/2010/index.html . render_archive:output/2013/index.html . render_archive:output/2012/index.html . render_archive:output/archive.html . copy_assets:output/assets/img/glyphicons-halflings.png . copy_assets:output/assets/img/glyphicons-halflings-white.png . copy_assets:output/assets/js/jquery.min.js . copy_assets:output/assets/js/bootstrap.js . copy_assets:output/assets/js/flowr.plugin.js . copy_assets:output/assets/js/bootstrap.min.js . copy_assets:output/assets/js/jquery.colorbox-min.js . copy_assets:output/assets/js/jquery.min.map . copy_assets:output/assets/js/jquery.colorbox.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-it.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-ca.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-kr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-fa.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-fr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-gl.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-id.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-pl.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-et.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-sr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-hu.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-es.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-si.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-hr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-cs.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-ro.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-ja.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-ru.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-no.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-he.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-bg.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-gr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-da.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-sk.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-lt.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-pt-br.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-ar.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-de.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-tr.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-my.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-nl.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-zh-CN.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-zh-TW.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-uk.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-sv.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-lv.js . copy_assets:output/assets/js/colorbox-i18n/jquery.colorbox-fi.js . copy_assets:output/assets/css/bootstrap.min.css . copy_assets:output/assets/css/theme.css . copy_assets:output/assets/css/bootstrap-responsive.min.css . copy_assets:output/assets/css/colorbox.css . copy_assets:output/assets/css/bootstrap-responsive.css . copy_assets:output/assets/css/bootstrap.css . copy_assets:output/assets/css/images/loading.gif . copy_assets:output/assets/css/images/controls.png . copy_assets:output/assets/js/mathjax.js . copy_assets:output/assets/js/html5.js . copy_assets:output/assets/css/rst.css . copy_assets:output/assets/css/code.css . render_tags:output/categories/index.html . render_tags:output/categories/Nikola.html . render_tags:output/categories/git.html . render_tags:output/categories/family.html . render_tags:output/categories/Python.html . render_tags:output/categories/benchmark.html . render_tags:output/categories/quiz.html . render_tags:output/categories/Windows.html . render_tags:output/categories/Android.html . render_tags:output/categories/physics.html . render_tags:output/categories/Debian.html . render_tags:output/categories/Nuitka.html . render_tags:output/categories/compiler.html . render_sources:output/posts/nuitka-release-055.rst . render_sources:output/posts/nuitka-shaping-up.rst . render_sources:output/posts/nuitka-release-054.rst . render_sources:output/posts/nuitka-release-053.rst . render_sources:output/posts/nuitka-release-052.rst . render_sources:output/posts/yup-another-python-riddle.rst . render_sources:output/posts/state-of-nuitka.rst . render_sources:output/posts/try-finally-python-quiz.rst . render_sources:output/posts/nuitka-release-051.rst . render_sources:output/posts/not-going-to-fosdem-2014.rst . render_sources:output/posts/nuitka-release-050.rst . render_sources:output/posts/re-about-python-3.rst . render_sources:output/posts/nuitka-standalone-mode-is-work-in-progress.rst . render_sources:output/posts/nuitka-release-047.rst . render_sources:output/posts/nuitka-in-arch-linux.rst . render_sources:output/posts/nuitka-and-guis.rst . render_sources:output/posts/nuitka-release-046.rst . render_sources:output/posts/nuitka-python-research-and-physics.rst . render_sources:output/posts/nuitka-release-045.rst . render_sources:output/posts/changing-python-faq.rst . render_sources:output/posts/my-europython-2013-report.rst . render_sources:output/posts/nuitka-release-044.rst . render_sources:output/posts/nuitka-release-043.rst . render_sources:output/posts/nuitka-on-github-bitbucket-and-gitorious.rst . render_sources:output/posts/going-to-europython-2013.rst . render_sources:output/posts/support-for-msvc-upcoming.rst . render_sources:output/posts/nuitka-needs-you-a-call-for-help.rst . render_sources:output/posts/support-for-portable-standalone-programs.rst . render_sources:output/posts/netbsd-support-upcoming.rst . render_sources:output/posts/nuitka-release-042.rst . render_sources:output/posts/nuitka-rpms-rhel-centos-f17-f18-opensuse.rst . render_sources:output/posts/nuitka-release-041.rst . render_sources:output/posts/pystone-comparison-nuitka-cython-and-cpython.rst . render_sources:output/posts/nuitka-not-on-pypi-currently.rst . render_sources:output/posts/nuitka-release-040.rst . render_sources:output/pages/donations.rst . render_sources:output/posts/python-3-nuitka-support-is-upcoming.rst . render_sources:output/pages/performance.rst . render_sources:output/posts/static-compilation-that-is-the-point.rst . render_sources:output/posts/nuitka-release-0325.rst . render_sources:output/posts/letting-go-of-c11.rst . render_sources:output/posts/nuitka-release-0324.rst . render_sources:output/posts/python-3-wonders-breaking-str.rst . render_sources:output/posts/python-3-wonders-barry-bdfl.rst . render_sources:output/posts/python-assert-quiz.rst . render_sources:output/posts/nuitka-and-debian-changes.rst . render_sources:output/posts/nuitka-release-0323.rst . render_sources:output/posts/speedcenter-is-back.rst . render_sources:output/pages/documentation.rst . render_sources:output/doc/developer-manual.rst . render_sources:output/doc/user-manual.rst . render_sources:output/posts/nikola-speed-improvements.rst . render_sources:output/posts/nikola-for-nuitka.rst . render_sources:output/posts/nuitka-release-0322.rst . render_sources:output/posts/nuitka-release-0321.rst . render_sources:output/posts/nuitka-release-0320.rst . render_sources:output/posts/award-winning-cat.rst . render_sources:output/posts/ubuntu-packages-for-nuitka.rst . render_sources:output/posts/nuitka-release-0319.rst . render_sources:output/posts/static-site-generator.rst . render_sources:output/posts/nuitka-release-0318.rst . render_sources:output/posts/nuitka-release-0317.rst . render_sources:output/pages/support.rst . render_sources:output/posts/nuitka-release-0316.rst . render_sources:output/posts/nuitka-release-0315.rst . render_sources:output/posts/nuitka-debian-package-and-windows-installer.rst . render_sources:output/posts/nuitka-release-0314.rst . render_sources:output/posts/nuitka-release-0313.rst . render_sources:output/posts/puting-a-conference-t-shirt-to-good-use.rst . render_sources:output/posts/nuitka-release-0312.rst . render_sources:output/posts/nuitka---pycon-de-video.rst . render_sources:output/posts/cat-update.rst . render_sources:output/posts/my-7yr-old-fotographer.rst . render_sources:output/posts/pycon-de-2011---my-report.rst . render_sources:output/posts/nuitka-git-flow.rst . render_sources:output/posts/nuitka-release-0311.rst . render_sources:output/posts/going-to-pycon-de.rst . render_sources:output/posts/nuitka-has-a-new-home---nuitkanet.rst . render_sources:output/pages/mailinglist.rst . render_sources:output/pages/impressum.rst . render_sources:output/posts/nuitka-release-0310.rst . render_sources:output/posts/the-new-cat.rst . render_sources:output/posts/nuitka-release-039.rst . render_sources:output/posts/nuitka-on-pybench---good-and-bad.rst . render_sources:output/posts/shes-a-doctor-now.rst . render_sources:output/posts/nuitka-release-038---windows-support.rst . render_sources:output/posts/nuitka-release-037.rst . render_sources:output/posts/nuitka-release-036.rst . render_sources:output/posts/nuitka-release-035.rst . render_sources:output/posts/python-float-quiz.rst . render_sources:output/posts/nuitka-release-034.rst . render_sources:output/posts/nuitka-pre-release-034pre1.rst . render_sources:output/posts/nuitka-needs-you---a-call-for-help.rst . render_sources:output/posts/nuitka-release-033.rst . render_sources:output/posts/loss-of-service.rst . render_sources:output/posts/nuitka-release-032.rst . render_sources:output/posts/tough-week.rst . render_sources:output/posts/nuitka-release-031.rst . render_sources:output/pages/overview.rst . render_sources:output/posts/nuitka-release-030.rst . render_sources:output/posts/release-nuitka-024.rst . render_sources:output/posts/python-exec-in-nested-functions-quiz.rst . render_sources:output/posts/release-nuitka-023.rst . render_sources:output/posts/python-scope-quiz.rst . render_sources:output/posts/release-nuitka-022.rst . render_sources:output/posts/family-photo.rst . render_sources:output/posts/release-nuitka-021.rst . render_sources:output/posts/release-nuitka-02.rst . render_sources:output/posts/new-git-repository-to-sync-with-nuitka-releases.rst . render_sources:output/posts/minor-release-nuitka-011.rst . render_sources:output/posts/releasing-nuitka-to-the-world.rst . render_sources:output/pages/download.rst . render_sources:output/pages/differences.rst . render_sources:output/posts/starting-to-blog-who-and-why.rst . copy_files:output/apple-touch-icon-iphone4.png . copy_files:output/favicon.ico . copy_files:output/apple-touch-icon-iphone.png . copy_files:output/apple-touch-icon-ipad.png . copy_files:output/humans.txt . copy_files:output/robots.txt . copy_files:output/favicon.png . copy_files:output/apple-touch-icon-ipad3.png . copy_files:output/apple-touch-icon.png . copy_files:output/posts/images/.keep_it . copy_files:output/assets/js/jquery.tagcanvas.min.js . copy_files:output/images/.keep_it . copy_files:output/pr/Nuitka-Presentation-PyCON-DE-2011.pdf . copy_files:output/pr/Nuitka-Presentation-PyCON-EU-2013.pdf . copy_files:output/pr/Nuitka-Presentation-PyCON-EU-2012.pdf . copy_files:output/wp-uploads/nuitka.net/IMG_0072-765x1024.jpg . copy_files:output/wp-uploads/nuitka.net/IMG_3837.jpg . copy_files:output/wp-uploads/nuitka.net/IMG_3767.jpg . copy_files:output/wp-uploads/nuitka.net/Michael-shot-of-his-mom.jpg . copy_files:output/wp-uploads/nuitka.net/Nuitka-git-flow.png . copy_files:output/wp-uploads/nuitka.net/IMG_3767-768x1024.jpg . copy_files:output/wp-uploads/nuitka.net/IMG_3837-768x1024.jpg . copy_files:output/wp-uploads/nuitka.net/Katze_Medaille.jpg . copy_files:output/wp-uploads/nuitka.net/Michael-shot-of-his-mom-1024x768.jpg . copy_files:output/wp-uploads/nuitka.net/IMG_0072.jpg . copy_files:output/wp-uploads/2010/09/Anna_Sonne_Andre_Michael.png . copy_files:output/wp-uploads/2011/04/Anna_Dithmarsia.jpg . copy_files:output/wp-uploads/2011/07/IMG_3530-1.jpg . copy_files:output/doc/images/Nuitka-Logo-Horizontal.png . copy_files:output/doc/images/Nuitka-Logo-Symbol.png . copy_files:output/doc/images/Nuitka-Logo-Vertical.png . copy_files:output/pages/images/debian.png . copy_files:output/pages/images/ubuntu.png . copy_files:output/pages/images/centos.png . copy_files:output/pages/images/pystone-binary-nuitka.svg . copy_files:output/pages/images/windows.jpg . copy_files:output/pages/images/opensuse.png . copy_files:output/pages/images/rhel.png . copy_files:output/pages/images/fedora.png . copy_files:output/pages/images/arch.jpg . copy_files:output/pages/images/pystone-nuitka.svg . copy_files:output/pages/images/git.jpg . copy_files:output/pages/images/pystone-memory-nuitka.svg . copy_files:output/posts/images/IMG_0072-765x1024.jpg . copy_files:output/posts/images/IMG_3837.jpg . copy_files:output/posts/images/IMG_3767.jpg . copy_files:output/posts/images/Michael-shot-of-his-mom.jpg . copy_files:output/posts/images/IMG_3530-1.jpg . copy_files:output/posts/images/Nuitka-git-flow.png . copy_files:output/posts/images/Nuitka-git-flow.odg . copy_files:output/posts/images/nuitka-website-logo.png . copy_files:output/posts/images/IMG_3767-768x1024.jpg . copy_files:output/posts/images/Anna_Sonne_Andre_Michael.png . copy_files:output/posts/images/Anna_Dithmarsia.jpg . copy_files:output/posts/images/europython-2012-07-img6319.jpg . copy_files:output/posts/images/IMG_3837-768x1024.jpg . copy_files:output/posts/images/Katze_Medaille.jpg . copy_files:output/posts/images/Michael-shot-of-his-mom-1024x768.jpg . copy_files:output/posts/images/IMG_0072.jpg . copy_files:output/posts/images/nikola-speed-improvements.png . render_indexes:output/index.html . render_indexes:output/index-1.html . render_indexes:output/index-2.html . render_indexes:output/index-3.html . render_indexes:output/index-4.html . render_indexes:output/index-5.html . render_indexes:output/index-6.html . render_indexes:output/index-7.html . render_indexes:output/index-8.html . render_indexes:output/index-9.html . render_indexes:output/index-10.html . render_galleries:output/galleries . render_galleries:output/galleries/Lanzarote-Timanfaya . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park . render_galleries:output/galleries/index.html . render_galleries:output/galleries/rss.xml . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6076.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6076.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6084.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6084.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6087.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6087.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6101.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6101.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6110.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6110.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6133.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6133.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6139.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6139.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6140.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6140.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6150.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6150.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6159.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/IMG_6159.JPG . render_galleries:output/galleries/Lanzarote-Timanfaya/index.html . render_galleries:output/galleries/Lanzarote-Timanfaya/rss.xml . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5986.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5986.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5987.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5987.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5997.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_5997.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_6000.thumbnail.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/IMG_6000.JPG . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/index.html . render_galleries:output/galleries/Lanzarote-Texas-Ranch-Park/rss.xml . render_pages:output/posts/nuitka-release-053.html . render_tags:output/categories/Debian.xml . render_pages:output/posts/yup-another-python-riddle.html . render_pages:output/posts/new-git-repository-to-sync-with-nuitka-releases.html . render_pages:output/posts/state-of-nuitka.html . render_pages:output/posts/try-finally-python-quiz.html . render_pages:output/posts/nuitka-release-036.html . render_pages:output/posts/cat-update.html . render_pages:output/posts/not-going-to-fosdem-2014.html . render_pages:output/posts/nuitka-release-0314.html . render_pages:output/posts/nuitka-release-050.html . generate_rss:output/rss.xml . render_pages:output/posts/minor-release-nuitka-011.html . render_pages:output/posts/nuitka-standalone-mode-is-work-in-progress.html . render_pages:output/posts/nuitka-release-047.html . render_pages:output/posts/nuitka-release-035.html . render_pages:output/posts/nuitka-in-arch-linux.html . render_pages:output/posts/nuitka-and-guis.html . render_pages:output/posts/nuitka-release-0313.html . render_pages:output/posts/nuitka-release-046.html . render_pages:output/posts/nuitka-python-research-and-physics.html . render_pages:output/posts/releasing-nuitka-to-the-world.html . render_pages:output/posts/nuitka-release-045.html . render_pages:output/posts/changing-python-faq.html . render_pages:output/posts/python-float-quiz.html . render_pages:output/posts/my-europython-2013-report.html . render_pages:output/posts/nuitka-release-044.html . render_pages:output/posts/puting-a-conference-t-shirt-to-good-use.html . render_pages:output/posts/nuitka-release-043.html . render_pages:output/posts/nuitka-on-github-bitbucket-and-gitorious.html . render_pages:output/pages/download.html . render_pages:output/posts/going-to-europython-2013.html . render_pages:output/posts/support-for-msvc-upcoming.html . render_pages:output/posts/nuitka-release-034.html . render_pages:output/posts/nuitka-needs-you-a-call-for-help.html . render_pages:output/posts/support-for-portable-standalone-programs.html . render_pages:output/posts/nuitka-release-0312.html . render_pages:output/posts/netbsd-support-upcoming.html . render_pages:output/posts/nuitka-release-042.html . render_pages:output/pages/differences.html . render_pages:output/posts/nuitka-rpms-rhel-centos-f17-f18-opensuse.html . render_pages:output/posts/nuitka-release-052.html . render_pages:output/posts/nuitka-release-041.html . render_pages:output/posts/nuitka-pre-release-034pre1.html . render_pages:output/posts/pystone-comparison-nuitka-cython-and-cpython.html . render_pages:output/posts/nuitka-not-on-pypi-currently.html . render_pages:output/posts/nuitka---pycon-de-video.html . render_pages:output/posts/nuitka-release-040.html . render_pages:output/pages/donations.html . render_pages:output/posts/starting-to-blog-who-and-why.html . render_pages:output/posts/python-3-nuitka-support-is-upcoming.html . render_pages:output/pages/performance.html . render_pages:output/posts/nuitka-needs-you---a-call-for-help.html . render_pages:output/posts/static-compilation-that-is-the-point.html . render_pages:output/posts/nuitka-release-0325.html . render_pages:output/posts/nuitka-release-051.html . render_pages:output/posts/letting-go-of-c11.html . render_pages:output/posts/nuitka-release-0324.html . render_pages:output/posts/python-3-wonders-breaking-str.html . render_pages:output/posts/python-3-wonders-barry-bdfl.html . render_pages:output/posts/nuitka-release-033.html . render_pages:output/posts/python-assert-quiz.html . render_pages:output/posts/nuitka-and-debian-changes.html . render_pages:output/posts/my-7yr-old-fotographer.html . render_pages:output/posts/nuitka-release-0323.html . render_pages:output/posts/speedcenter-is-back.html . render_pages:output/pages/documentation.html . render_pages:output/doc/developer-manual.html . render_pages:output/posts/loss-of-service.html . render_pages:output/doc/user-manual.html . render_pages:output/posts/nikola-speed-improvements.html . render_pages:output/posts/pycon-de-2011---my-report.html . render_pages:output/posts/nikola-for-nuitka.html . render_pages:output/posts/nuitka-release-0322.html . render_pages:output/posts/nuitka-release-0321.html . render_pages:output/posts/nuitka-release-0320.html . render_pages:output/posts/nuitka-release-032.html . render_pages:output/posts/award-winning-cat.html . render_pages:output/posts/ubuntu-packages-for-nuitka.html . render_pages:output/posts/re-about-python-3.html . render_pages:output/posts/nuitka-release-0319.html . render_tags:output/categories/Nuitka.xml . render_pages:output/posts/static-site-generator.html . render_pages:output/posts/nuitka-release-0318.html . render_pages:output/posts/nuitka-release-0317.html . render_pages:output/posts/tough-week.html . render_pages:output/pages/support.html . render_pages:output/posts/nuitka-release-0316.html . render_pages:output/posts/nuitka-release-0311.html . render_pages:output/posts/nuitka-release-0315.html . render_pages:output/posts/nuitka-debian-package-and-windows-installer.html . render_pages:output/posts/nuitka-release-031.html . render_pages:output/posts/going-to-pycon-de.html . render_pages:output/pages/overview.html . render_pages:output/posts/nuitka-has-a-new-home---nuitkanet.html . render_pages:output/posts/nuitka-release-030.html . render_pages:output/pages/mailinglist.html . render_pages:output/posts/release-nuitka-024.html . render_pages:output/pages/impressum.html . render_pages:output/posts/python-exec-in-nested-functions-quiz.html . render_pages:output/posts/nuitka-release-0310.html . render_pages:output/posts/release-nuitka-023.html . render_pages:output/posts/the-new-cat.html . render_pages:output/posts/family-photo.html . render_pages:output/posts/python-scope-quiz.html . render_pages:output/posts/nuitka-release-039.html . render_pages:output/posts/release-nuitka-022.html . render_pages:output/posts/nuitka-on-pybench---good-and-bad.html . render_tags:output/categories/Nikola.xml . render_pages:output/posts/nuitka-release-055.html . render_tags:output/categories/git.xml . render_pages:output/posts/nuitka-git-flow.html . render_tags:output/categories/family.xml . render_pages:output/posts/shes-a-doctor-now.html . render_tags:output/categories/compiler.xml . render_tags:output/categories/Python.xml . render_tags:output/categories/benchmark.xml . render_pages:output/posts/release-nuitka-021.html . render_tags:output/categories/quiz.xml . render_pages:output/posts/nuitka-release-038---windows-support.html . render_tags:output/categories/Windows.xml . render_tags:output/categories/physics.xml . render_tags:output/categories/Android.xml . render_pages:output/posts/release-nuitka-02.html . render_pages:output/posts/nuitka-shaping-up.html . render_pages:output/posts/nuitka-release-054.html . render_pages:output/posts/nuitka-release-037.html . sitemap:output/sitemap.xml . sitemap:output/sitemapindex.xml

kayhayen commented 9 years ago

The same command sequence worked moments before and produced a sitemap, where really only the day of update was changed, which is correct.

kayhayen commented 9 years ago

Ok, I found something that makes the different. I have a filter for HTML files, which is basically this code: import os, codecs, lxml.etree tree = lxml.etree.parse(infile, lxml.etree.HTMLParser())

contents = lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html")

with codecs.open(infile, "w", "utf8") as f:
    f.write(contents)

When I remove the writing of the file, the site builds fine sitemap.xml, when I do not, it is broken. I am going to make a reproducer with new site.

kayhayen commented 9 years ago

Ah, this helps, but there is no reason, why it should:

contents = b'<!DOCTYPE html>\n' + lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html")

Somehow the sitemap creation depends on this, right? And it's wrong to do so in my not so humble opinion, should that be the case.

da2x commented 9 years ago

You must have a html doctype due to legacy and the pains and tears of past generations. Browsers go wonko without it.

As for Nikola, maybe HTML without doctype should be logged as a notice from the sitemap task.

kayhayen commented 9 years ago

I disagree with that code. Having old files not listed in the sitemap, is not its purpose. Please either remove the "continue" and/or make it warnings.

I could well image and likely have files put into "files", that are in older HTML codes. The browsers may well fall back to lagacy rendering modes with them.

Anyway, decide on that, and close if you disagree.

Kwpolska commented 9 years ago

The files we ignore with that notice are not “old”. The files we intend to ignore come from things like Google Webmaster Tools site verification — and they should not appear in sitemaps.

Since most normal pages do have a doctype, this was chosen as a check to get rid of those files. Note that we test for any HTML doctype. And if you have files without a doctype, you should add one ASAP. Quirks mode is not something you want to endure.

Also, what does your filter change? Doesn’t Nikola pretty-print by default already?

kayhayen commented 9 years ago

Hello there,

Also, what does your filter change? Doesn't Nikola pretty-print by default

already?

my filter is basically redoing my fork stuff, which never got merged:

First, I hack table of contents to display more nicely:

# Make table of contents formatted nicely, as a box to the right of the
# other text. Used in manuals.
for node in tree.xpath("//div[@class='contents topic']"):
    node.attrib["class"] += " pull-right navbar alert"

Then, footer is displayed at the bottom, but visually poor:

# More visually appealing footer, make it more clear where it applies.
footer, = tree.xpath("//div[@class='footerbox']")
footer.attrib["class"] += " well navbar navbar-bottom"

# Also re-own footer, to content part.
node, = tree.xpath("//div[@class='span8']")
node.append(footer)
footer.getparent().remove(footer)

The footer is owned by the page, but I would rather have it apply to the middle row with the content only. And make it visually attractive.

This does add the tag cloud, extracting the data an inlining it:

# Add a third column with span2 for the tag cloud and potentially other
# stuff.
sidebar = """\

(Drag to control)

"""

liste = []

import json
for name, data in

sorted(json.load(open("output/assets/js/tag_cloud_data.json")).iteritems()): liste.append('

  • %s
  • ' % ( data[1], data[0], name ) ) sidebar = sidebar % "\n".join(liste)

    node.getparent().append(
        lxml.etree.fromstring(sidebar)
    )
    
    for node in tree.xpath("//body"):
        for child in lxml.etree.fromstring("""\

    <script src="/assets/js/jquery.tagcanvas.min.js" type="text/javascript">

    """, lxml.etree.HTMLParser()): node.append(child)

    The above saves me no loess than 50k of loaded data. Even if cached, the initial load of the tag cloud is instant that way.

    Then, I really like a visual logo to be used:

    # Place the logo, where the title normally is.
    node, = tree.xpath("//span[@id='blog-title']")
    node.text = None
    node.append(
        lxml.etree.fromstring("""<img class="center-block branch"

    src="/posts/images/nuitka-website-logo.png" width="120" height="24" href=" http://nuitka.net" alt="Back to Nuitka Home">""") )

    This is unsupported by Nikola themes apparently.

    Then, really, I don't like my name on each posting:

    # Remove stupid author mentions across the place.
    for node in tree.xpath("//p[@class='byline author vcard']"):
        node.getparent().remove(node)

    Then, adding the tagcanvas script loading, but in stupid way, ignore the abuse of "for" loop to only do it once.

    for node in tree.xpath("//body"):
        child = lxml.etree.fromstring("""\

    <script src="/assets/js/jquery.tagcanvas.min.js" type="text/javascript">

    """) node.append(child)

        break
    else:
        # Expect to match.
        assert False

    Then I have this:

    def modifyLine(line):
        if 'itemprop="headline name"><a href="#" class="u-url">' in line:
            line = line.replace(
                'itemprop="headline name"><a href="#" class="u-url">',
                'itemprop="headline name">'
            )
            line = line.replace("</a>", "")
    
        if '<p class="dateline"><a href="#" rel="bookmark">' in line:
            line = line.replace(
                '<p class="dateline"><a href="#" rel="bookmark">',
                '<p class="dateline">'
            )
            line = line.replace("</a>", "")
    
        return line
    
    contents = "\n".join(
        modifyLine(line)
        for line in
        contents.split("\n")
    )

    Just some quick hacks to remove links, I see no point in. Each post title is a link, and thus blue, and a link to nothing really. So is the date, and I think the poster name too (would be interesting to create a list of posts by an author).

    And since I don't like source links in the navigation bar, where they are also named "Source", which confused people who want to download software, I did this:

    contents = contents.replace(
        "SOURCE_NAME",
        os.path.basename(infile.decode()).replace(".html", ".rst")
    )

    This is in conjunction with:

    CONTENT_FOOTER = '''\ REST Source - Contents © 2014 Kay Hayen '''

    And also, I am doing this in filters:

    ".html" : [ htmlcompressor, hackContentClasses,  ],

    with:

    def htmlcompressor(infile): import os

    return filters.runinplace(r'java -jar %s --preserve-line-breaks

    --nomunge --compress-js %%1 -o %%2' % os.path.join( os.path.dirname( file ), "tools/htmlcompressor-1.5.3.jar" ), infile )

    Formerly, I was a Nikola contributor, but I couldn't keep up with the pace of changes, and invest the time to merge my changes.

    Notice, I haven't deployed the result yet. One thing, I also don't like, is that tabular approach at post start. It used to say "Posted: xxx More posts about". Now that is no longer a sentence (just an accumulation of facts), which I don't like.

    I didn't mean to make a theme, because it would again inccur effort to maintain. Therefore I prefer to attach classes with existing meanings, and to move around items in the lxml node tree.

    Please point out, where I am missing out on existing Nikola features.

    Yours, Kay

    Kwpolska commented 9 years ago

    Logos are supported:

    # Nikola supports logo display.  If you have one, you can put the URL here.
    # Final output is <img src="LOGO_URL" id="logo" alt="BLOG_TITLE">.
    # The URL may be relative to the site root.
    # LOGO_URL = ''
    
    # If you want to hide the title of your website (for example, if your logo
    # already contains the text), set this to False.
    # SHOW_BLOG_TITLE = True
    

    Now, some other fixes are good and could potentially make their way upstream (eg. table of contents fixes).

    And other fixes would be better off if you created a theme. Things won’t break badly if you don’t always merge changes from upstream.

    But basically, it seems that the only thing you need to add is <!DOCTYPE html> to the beginning of your lxml output — and that should make everything work…

    kayhayen commented 9 years ago

    Hello there,

    2014-10-27 17:05 GMT+01:00 Chris "Kwpolska" Warrick < notifications@github.com>:

    Logos are supported:

    Nikola supports logo display. If you have one, you can put the URL here.# Final output is .

    Not good enough. The image sizes are missing. The "Back to" in the alt title is missing. And doesn't it have to have classes. Do you have an example of this actually used, so I could check it out.

    The image sizes are another thing, that I am forever disappointed by Nikola, and never got around to fixing it. Obviously the image may not yet be there. I am going to work around that like I did with the tag cloud json, by explicitly building the copying of the images, first.

    Then, in an lxml query of img, checking the files in output, I will add the image sizes, where they are missing (everywhere I think, I am not doing it manually in the REST). Not providing image sizes can delay rendering in the browser or/and make it look bad.

    Now, some other fixes are good and could potentially make their way upstream (eg. table of contents fixes).

    Not sure if it's fixes. I am abusing existing classes there, e.g. alert, which changes the colors to be more pretty.

    And other fixes would be better off if you created a theme. Things won't break badly if you don't always merge changes from upstream.

    It's also with the idea, that changing the theme will still be mostly easy. I am not sure if I am right there, but I found that adding the third column in the theme was relatively hard, with lxml xpath query, it's way easier to properly position stuff.

    The things that are not xpath queries are because I started out with extending an existing "sed" command, that I always had in place.

    But basically, it seems that the only thing you need to add is <!DOCTYPE html> to the beginning of your lxml output -- and that should make everything work...

    I of course already did that. It's easy to find in the sources of Nikola. It's a shame that lxml outputs this, if you are so convinced that one has to. I saw a parameter somewhere on this page:

    http://lxml.de/api/lxml.html-module.html#tostring

    So there is an argument to include the meta content type, which defaults to false. But when I attempted that, it didn't work for my installation (no such parameter), so it's either new, or I am getting something wrong. The default looks wrong of course. Probably something about being backwards compatible.

    Anyway, I deployed the new Nikola to my internal automatic updates of nuitka.net, and it's mostly fine.

    One annoying thing, is that all the RSS feeds change the "buildDate" every day. I had something of a "shelve" file before, where I stored that date. I am likely to just put the last created "buildDate" now into a "json" file, which then is added to the repo and do this in a filter with xml.

    That is annoying, because the file changes in my output.git, making it look like something changed, when it didn't, making it harder for me to detect, what actually changed in automatic deployments.

    Yours, Kay

    Kwpolska commented 9 years ago

    Logos are supported:

    Not good enough. The image sizes are missing. The "Back to" in the alt title is missing. And doesn't it have to have classes. Do you have an example of this actually used, so I could check it out.

    https://getnikola.com/ or https://chriswarrick.com/ (my blog uses SHOW_TITLE = True)

    http://lxml.de/api/lxml.html-module.html#tostring

    So there is an argument to include the meta content type, which defaults to false. But when I attempted that, it didn't work for my installation (no such parameter), so it's either new, or I am getting something wrong. The default looks wrong of course. Probably something about being backwards compatible.

    DOCTYPE ≠ Content-Type.

    Best solution:

    contents = lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html", doctype="<!DOCTYPE html>")
    
    with codecs.open(infile, "w", "utf8") as f:
        f.write(contents)

    If this fails:

    contents = "<!DOCTYPE html>\n" + lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html")
    
    with codecs.open(infile, "w", "utf8") as f:
        f.write(contents)

    One annoying thing, is that all the RSS feeds change the "buildDate" every day. I had something of a "shelve" file before, where I stored that date. I am likely to just put the last created "buildDate" now into a "json" file, which then is added to the repo and do this in a filter with xml.

    This doesn’t happen on my site, unless I modify a post — at which point changing the date makes perfect sense.

    If you still don’t like that, you can change lastBuildDate in nikola.nikola.Nikola.generic_rss_renderer.

    If rebuilds actually happen every day, that’s a different bug.

    kayhayen commented 9 years ago

    Hello there,

    thanks for the replies:

    Best solution:

    contents = lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html", doctype="<!DOCTYPE html>") with codecs.open(infile, "w", "utf8") as f: f.write(contents)

    If this fails:

    Which it does, or so I believe. Nikola source also has this:

    contents = "<!DOCTYPE html>\n" + lxml.etree.tostring(tree.getroot(), pretty_print=True, method="html")

    with codecs.open(infile, "w", "utf8") as f: f.write(contents)

    One annoying thing, is that all the RSS feeds change the "buildDate" every day. I had something of a "shelve" file before, where I stored that date. I am likely to just put the last created "buildDate" now into a "json" file, which then is added to the repo and do this in a filter with xml.

    This doesn't happen on my site, unless I modify a post -- at which point changing the date makes perfect sense.

    My Buildbot effectively does a git clean -dfx which removes cache and .doit.db files. I think that case, it definitely happens.

    If you still don't like that, you can change lastBuildDate in nikola.nikola.Nikola.generic_rss_renderer.

    If rebuilds actually happen every day, that's a different bug.

    Actually, I expect to be able to do "rm -rf output/*" and reproduce the same site any time. My Buildbots occasionally will do that too.

    Hacking the Nikola source is no longer acceptable. In fact, my previous private fork did that, and saved the datetime of last publication in a shelve.

    You as a project may not share that requirement. I regard reproducible output as a corner stone to being able to do effective regression testing. The xml filter that I posted in another (now closed) issue will do the trick for me. It's also an outline, how Nikola could do it too.

    Also, I would love if sitemaps didn't update time stamps of posts for no good reasons too. I am doing time stamp modifications of and site.git and output.git files to last commit dates prior to builds. But as soon as e.g. my tag cloud data changes, each post gets touched, although the content didn't change, so my sitemap says everything was updated each time I add a tag to a post, the previous/next links change, etc.

    Not sure how to go about that. Maybe the sitemap date should be the latests of the original files leading to it, and not have anything to do with output.git timestamps at all. From what I recall though, there is little link of that code to the chain creating the files, is there?

    Yours, Kay

    Kwpolska commented 9 years ago

    There actually is a way to create output that doesn’t change: nikola build --invariant. This is what we use for our regression testing, and it works — at least for us, because we can’t guarantee it will be good for you. (requires pip install freezegun)

    The output of that is reproducible, provided that the exact same machine is used for building each time. (why it has problems otherwise is a thing we do not understand.)

    (Also: note that rm -rf output/; nikola build is against our philosophy of “rebuild only what’s needed”. You don’t have to agree and certainly can do this, but just remember that we don’t always try to cater to people doing that.)

    kayhayen commented 9 years ago

    Hello Chris,

    to me, building only what's needed, is a nice idea. However, I would consider that the most bugs that I reports in my Nikola life were all about missing or not working dependencies. It is very difficult to get right. But, I do appreciate it, and for interactive use, it totally is a must, so don't get me as opposed to it.

    But, on my buildbot in the night, or triggered by a succeeded build that should update the downloads page, or a new release automatically creating posts from the chagelog, I really don't care, if the build takes a few minutes longer, if that means to avoid such dependency bugs.

    So I would humbly ask you to consider it still as a valid use case. I am keeping output.git merely to ease my analysis of what e.g. a Nikola update changed, or what a new post added, but normally I wouldn't have to do that, or only do it after the build when I deploy from my buildbot.

    In the alternative, given proper Nikola APIs, I can probably, like I did with the RSS feeds time stamps, come up with code that will handle this outside of Nikola. I could post them and you might want to link to it when the issue comes up.

    I didn't know about freezegun. Interesting. I would prefer to just not use file system times at all. I don't see their point. I would rather want to track content changes in checksums and use those dates when these were first observed.

    Yours, Kay

    Kwpolska commented 9 years ago

    We do support that use case, but there are some cases where we don’t just think about making things nicer.

    As I said: nikola build --invariant should help you. And if it doesn’t, report a bug and we will try to fix it.

    kayhayen commented 9 years ago

    Hello Chris,

    I thought it's only for testing. I used it. And it appears to be buggy. Can it be, that it sets the datetime to first 1.1.2014? My posts from 2014 apparently didn't count into the tag could:

    e.g. 2014/index.html got deleted

    Many postings have changed with this:

    -

  • Nuitka
  • -
  • Python
  • +
  • Nuitka
  • +
  • Python
  • And the sitemap has:

    http://nuitka.net/posts/support-for-portable-standalone-programs.html 2014-01-01 I am not sure, how freezegun could be used properly for production stuff. Do you really think it's fixable? I would rather propose to follow the input file timestamps dependencies, and just use them. That in combination with setting those to their git commit date, ought to work perfect. Oh, found it: **main**.py: freeze = freezegun.freeze_time("2014-01-01") You probably want to use "next actual year" there, otherwise it will consider my newer posts as to be published in the future. But really, no datetime fakery should be necessary, right? Yours, Kay
    Kwpolska commented 9 years ago
    1. Yes, it’s set to 2014. Made it 2038 now.
    2. Don’t name it freezegun.
    3. AFAIK, we try to follow file timestamps, but the RSS date is the last update timestamp and some people may depend on that.
    kayhayen commented 9 years ago

    Oh,

    and I forgot to make it clear maybe. With setting git commit dates as file system dates, with previous Nikola already, I was able to get reproducible builds. There were only a few tweaks necessary, like e.g. one I now turned into an xml filter.

    All I am talking, would just lead to not having to do it for "output.git", but it's not a pressing issue to me. I expect to be able to go on like that.

    One thing, that hurts me, is now that I inlined tag cloud data into each HTML file, that it's apparently not the right approach anymore. But that's my problem to suffer through.

    As Nikola, deep in its heart, knows the files, an output depends upon, I can very probably come up with a timestamp for the resulting file in the filters.

    Yours, Kay

    2014-10-28 20:27 GMT+01:00 Kay Hayen kay.hayen@gmail.com:

    Hello Chris,

    I thought it's only for testing. I used it. And it appears to be buggy. Can it be, that it sets the datetime to first 1.1.2014? My posts from 2014 apparently didn't count into the tag could:

    e.g. 2014/index.html got deleted

    Many postings have changed with this:

    -

  • Nuitka
  • -
  • Python
  • +
  • Nuitka
  • +
  • Python
  • And the sitemap has:

    http://nuitka.net/posts/support-for-portable-standalone-programs.html 2014-01-01 I am not sure, how freezegun could be used properly for production stuff. Do you really think it's fixable? I would rather propose to follow the input file timestamps dependencies, and just use them. That in combination with setting those to their git commit date, ought to work perfect. Oh, found it: **main**.py: freeze = freezegun.freeze_time("2014-01-01") You probably want to use "next actual year" there, otherwise it will consider my newer posts as to be published in the future. But really, no datetime fakery should be necessary, right? Yours, Kay
    kayhayen commented 9 years ago

    You wrote:

    1. AFAIK, we try to follow file timestamps, but the RSS date is the last update timestamp and some people may depend on that.

    The thing with file system timestamps is that they are lossy. Even if a file gets rebuilt, but not changes one bit, it appears new in sitemap. That rebuild might be triggered by e.g. a Nikola update, changing the machine used to build, removing the output.

    But I think we agree to disagree here. I shall persue the path of fixing up file stamps in the filter with another algorithm. Luckily Nikola is extensible.

    Thanks, Kay