Automattic / msm-sitemap

Comprehensive sitemaps for your WordPress VIP site. Joint collaboration between Metro.co.uk, WordPress VIP, Alley Interactive, Maker Media, 10up, and others.
73 stars 37 forks source link

Consider not storing full sitemap XML #110

Open mjangda opened 7 years ago

mjangda commented 7 years ago

Right now, sitemap XML is generated async and stored in the database to allow them to be served super quickly. The downside is that any code changes that modify the XML output means all sitemaps need to be re-generated which can be a very slow, time-consuming process on really large sites with thousands of sitemaps.

We should explore alternate ways to handle this (while maintaining backwards compat with existing actions/filters) and evaluate whether those approaches make sense.

systemseven commented 7 years ago

Sitemaps have to be regenerated when the template changes, no way around that. The trick here is to figure out how to do that efficently.

Right now the flow is as follows (simplistic version)

Here's what I'm proposing

mjangda commented 7 years ago

we create a list of posts (basically just their urls) that need to be included in that sitemap

What happens if the URL structure changes? How do we get other related data like the post modified date from just the URL?

Alot of the tags never change, so we could hard code them in the template vs using SimpleXML (ie: loc, lastmod, etc..)

Do we get any major benefits from switching to a hard-coded template? How will we maintain backwards compatibility (e.g. some filters pass in the simplexml object that sites use to add things like images)?

Performance wise I think we're gonna net out close to the same, but this method I'm proposing may be a bit more expensive

This is probably the biggest thing we'll need to watch. Some of the sites using this plugin have millions of posts dating back 5/10/20 years. If the newer method is significantly slower, it may not be worth it so it would be good to gather and compare some data as we work on this.

systemseven commented 7 years ago

Hey Mo,

A few others asked the some of the same questions you did on the internal site.

For the urls, that's just an example, we'll just need to make sure we store the right data needed, and in the post_meta, the the post_content

Good catch about the backwards compatibility on the SimpleXML objects, I'm going to take a look at that.

Thanks