c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
366 stars 110 forks source link

Handling more than 50,000 URLs #59

Open jswilson opened 4 years ago

jswilson commented 4 years ago

Hi, just wanted to say thanks for such a great library.

One need we have is to generate a sitemap for a site that has more than 50,000 URLs. The search engines typically only handle a maximum of 50,000 URLs per sitemap file, which means today that we manually create a sitemap index and move the URLs into individual sitemap files, each containing less than 50,000 URLs each.

One option I was considering was adding a feature to python-sitemap that would optionally output a sitemap index and multiple sitemap files if there are more than 50,000 URLs; would that be of interest? Just wanted to make sure that kind of feature would be desired prior to implementing; thanks!

c4software commented 4 years ago

First time i seen this point. Seems I will have to chunk the produced sitemap 🤔

jswilson commented 4 years ago

First time i seen this point. Seems I will have to chunk the produced sitemap 🤔

If you think it would be a good addition, I actually may be able to implement this; I can't quite do it immediately, but could check in with you in a couple weeks about potentially doing it.

c4software commented 4 years ago

I confirm. It's a good addition (and is on the spec so 👌). I looked at how to integrate it, but it doesn't seems thats simple to do it. Tell me.

Garrett-R commented 3 years ago

This issue is resolved now that #65 is merged, right?