Lissy93 / web-check

🕵️‍♂️ All-in-one OSINT tool for analysing any website
https://web-check.xyz
MIT License
21.81k stars 1.65k forks source link

Recursive Sitemap Index Parsing #165

Open shutupflanders opened 1 month ago

shutupflanders commented 1 month ago

This is a nice tool, I'll certainly be using it a lot more moving forward.

However, I noticed when testing a website that has a sitemap index file, it doesn't recursively parse the sitemaps within:

image

No biggie, but it would be good to see the full resultset if possible.

varenc commented 1 month ago

++ to this.

Sitemaps that exist as a single file are usually the small ones that are easy manually look over. The ones that use many tiered layers of indirect references are exactly the ones where a tool is most valuable. One example of a complex multi-file sitemap: https://www.apple.com/sitemap.xml

The sitemap implementation I found here appears to make some other overly simple assumptions. Roughly:

Building this out fully is equivalent to just building a real sitemap parser for indexing, so perhaps one of those can just be repurposed.