Recursively crawling https://blog.cyone.ch/ does not work as expected: wpull --recursive --sitemaps https://blog.cyone.ch/ only retrieves the homepage (which is a 404), robots.txt, and the sitemap. In particular, it doesn't follow the URLs mentioned in the sitemap. (It also doesn't extract links on the 404 page, which may be related to #202.)
The 404 alone does not appear to explain this: wpull --recursive --sitemaps https://du-willst-mehr.ch/ does recurse by following the URLs in the sitemap, even though the homepage is also a 404. Its sitemap has sub-sitemaps as opposed to directly URLs, which might play a role.
Both of these were discovered through ArchiveBot, i.e. wpull 2.0.3.
Recursively crawling https://blog.cyone.ch/ does not work as expected:
wpull --recursive --sitemaps https://blog.cyone.ch/
only retrieves the homepage (which is a 404), robots.txt, and the sitemap. In particular, it doesn't follow the URLs mentioned in the sitemap. (It also doesn't extract links on the 404 page, which may be related to #202.)The 404 alone does not appear to explain this:
wpull --recursive --sitemaps https://du-willst-mehr.ch/
does recurse by following the URLs in the sitemap, even though the homepage is also a 404. Its sitemap has sub-sitemaps as opposed to directly URLs, which might play a role.Both of these were discovered through ArchiveBot, i.e. wpull 2.0.3.