Open shutupflanders opened 1 month ago
++ to this.
Sitemaps that exist as a single file are usually the small ones that are easy manually look over. The ones that use many tiered layers of indirect references are exactly the ones where a tool is most valuable. One example of a complex multi-file sitemap: https://www.apple.com/sitemap.xml
The sitemap implementation I found here appears to make some other overly simple assumptions. Roughly:
robots.txt
, whereas I believe both can be validrobots.txt
it assumes that only one sitemap URL is specified instead but the specs allows for multiple sitemap files to be specified. Example: https://www.apple.com/robots.txtBuilding this out fully is equivalent to just building a real sitemap parser for indexing, so perhaps one of those can just be repurposed.
This is a nice tool, I'll certainly be using it a lot more moving forward.
However, I noticed when testing a website that has a sitemap index file, it doesn't recursively parse the sitemaps within:
No biggie, but it would be good to see the full resultset if possible.