autogram-is / spidergram

Structural analysis tools for complex web sites
GNU General Public License v3.0
111 stars 4 forks source link

Robots/Sitemap URLs don't automatically get the correct handler #41

Closed eaton closed 1 year ago

eaton commented 1 year ago

Sitemaps found during the crawl work fine because they're tagged with the correct handler; robots.txt files and sitemap files found with the pre-crawl CLI command aren't tagged correctly, however, and as such aren't parsed correctly.

eaton commented 1 year ago

Sitemap and robots.url handling are being reworked; as of 0.9.0 sitemap and robots files are not auto-discovered during crawls, making this issue moot.