c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
366 stars 110 forks source link

Stop and continue #58

Open ishandutta2007 opened 4 years ago

ishandutta2007 commented 4 years ago

The issue with this tool is once it halts, your have to start all over again from scratch. And with large sites this is a very common scenario. Since we already have the partially generated xml, it would be nice to continue from where it was interrupted. Let me know your thoughts on this and how to achieve this, I am willing to send pull request once I have a better understanding of the code

c4software commented 4 years ago

Hi,

It's a really nice idea. The major drawback I can see is that we can miss some new page in previously crawled pages.

But with some work (like preload all links previously crawled to avoid refetching) we can implement that kind of feature.