c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
366 stars 110 forks source link

Endless loop part 2: report error and document workaround #37

Open ghost opened 7 years ago

ghost commented 7 years ago

Command: python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.forum.2globalnomads.info --image --output sitemap.xml --verbose Your fix

--drop "sid=[a-z0-9]{32}"

Although it is not likely that people will run this script on phpbb3 forums as there is already a mod for making sitemap, please consider adding your workaround to the documentation.

The same indefinite loop will happen with all phpbb3 installation and there are tens of thousands of them. Also, might be good idea to add there some kind of guard or timeout to detect loops so that you can gracefully exit and give a proper error message. Similar issue can actually happen with any website that has session management.