c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
362 stars 110 forks source link

Include iframe contents #92

Open marshvee opened 4 months ago

marshvee commented 4 months ago

Iframes are sometimes used to have parts of sites controlled by a CMS.

Here we add the option of inspecting the iframe's content and for any links that are to the site being indexed, having those included. It takes into account the tag, since if the base tag matches the site being indexed, then all relative URLs should be crawled.

If you want to enable this option, you can just add the flag:

python main.py --fetch-iframes

Resolves #90