Recursive Sitemap Index Parsing

++ to this.

Sitemaps that exist as a single file are usually the small ones that are easy manually look over. The ones that use many tiered layers of indirect references are exactly the ones where a tool is most valuable. One example of a complex multi-file sitemap: https://www.apple.com/sitemap.xml

The sitemap implementation I found here appears to make some other overly simple assumptions. Roughly:

If it finds /sitemap.xml at the root, is doesn't look in robots.txt, whereas I believe both can be valid
In robots.txt it assumes that only one sitemap URL is specified instead but the specs allows for multiple sitemap files to be specified. Example: https://www.apple.com/robots.txt
And like @shutupflanders said, it doesn't follow indirectly references sitemaps and assumes all sitemap content exists in one file.

Building this out fully is equivalent to just building a real sitemap parser for indexing, so perhaps one of those can just be repurposed.

Lissy93 / web-check

Recursive Sitemap Index Parsing #165