Currently the sitemap pagination can be expensive for forums with >100k threads. This is because it uses one query per page, trying to achieve exact pagination (1000 threads per page).
It's possible to go with a less accurate approach (not 1000 threads but 1000 thread ids per page, leaving out deleted/missing threads), as this requires only a single query to build the main index by using GROUP BY floor(tid/pagination). Even for large tables this seems less expensive than extensive sorting / limiting queries.
Downside is that it would lead to lots of nearly empty sitemap pages for forums that had lots of threads deleted or in inaccessible sections, and also more timestamp noise if we sort by id rather than date. E.g. the MyBB community forums (if they were to use this sitemap) would see 113 pages for threads with the new method when there's actually just 95 pages with the old method. However building the sitemap index would use just one query instead of 95...
So the cheaper sitemap wins in terms of performance but the old method is more accurate... but since Google doesn't care about the number of sitemap pages (as long as there arent completely empty ones) we should go with the cheaper approach
Currently the sitemap pagination can be expensive for forums with >100k threads. This is because it uses one query per page, trying to achieve exact pagination (1000 threads per page).
It's possible to go with a less accurate approach (not 1000 threads but 1000 thread ids per page, leaving out deleted/missing threads), as this requires only a single query to build the main index by using GROUP BY floor(tid/pagination). Even for large tables this seems less expensive than extensive sorting / limiting queries.
Downside is that it would lead to lots of nearly empty sitemap pages for forums that had lots of threads deleted or in inaccessible sections, and also more timestamp noise if we sort by id rather than date. E.g. the MyBB community forums (if they were to use this sitemap) would see 113 pages for threads with the new method when there's actually just 95 pages with the old method. However building the sitemap index would use just one query instead of 95...
So the cheaper sitemap wins in terms of performance but the old method is more accurate... but since Google doesn't care about the number of sitemap pages (as long as there arent completely empty ones) we should go with the cheaper approach