crimethinc / website

Ruby on Rails app that powers crimethinc.com
https://crimethinc.com
Creative Commons Zero v1.0 Universal
100 stars 31 forks source link

Sitemap ping failing #3780

Open veganstraightedge opened 4 months ago

veganstraightedge commented 4 months ago

Seen in output from:

heroku releases:output --remote heroku
Pinging with URL 'https://crimethinc.com/sitemap.xml.gz':
Ping failed for Google: #<OpenURI::HTTPError: 404 Sitemaps ping is deprecated. See https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping.> (URL http://www.google.com/webmasters/tools/ping?sitemap=https%3A%2F%2Fcrimethinc.com%2Fsitemap.xml.gz)
just1602 commented 4 months ago

Is it possible with the heroku CLI to list the files in the dyno public/ directory?

just1602 commented 1 month ago

The sitemap_generator gem doesn't seem to be maintained anymore. The right solution would probably be to generate our own sitemap with a template, like we do for the atom feed.

I just don't know if we should do it in a rake task and save it on disk like the gem does or expose a endpoint that we would cache.

just1602 commented 1 month ago

I was thinking about that today, and we should probably move the sitemap generation in a background job that is trigger at deployment, but also every time we publish or update all type of content. Because if you read the developers.google.com page in the warning, they said that if the sitemap lastmod attribute isn't up-to-date and accurate, they'll stop trusting it.

bensheldon commented 1 month ago

Just an idea (and not entirely trivial), but I've been wanting to convert my personal sites from that gem to something like this: https://www.johnnunemaker.com/rails-easy-sitemaps/

just1602 commented 1 month ago

My fear was that it would be a slow everyone, but I guess I can pour some caching in the template base on the lastmod value.

Otherwise, I'm not sure to understand why it needs a sitemap of sitemap (the index and pages actions) by I really like the general idea.

bensheldon commented 1 month ago

I'm not sure to understand why it needs a sitemap of sitemap (the index and pages actions)

A single sitemap file is only allowed to hold a maximum of 50k URLs, so (for an arbitrarily large/growing site) it is necessary to break down into multiple sitemap files plus an index file to reference the multiple sitemap files. The idea of breaking the sitemap files down by month is so the index can be generated without having to tablescan every record to do a numeric group-by or something that would require checking the presence of a record to generate the index.

My fear was that it would be slow

Same, same 🤗 They could even be cached with a 1-day TTL and be no worse than really than the sitemap_generator's static sitemap.

just1602 commented 1 month ago

Thanks for the clarification @bensheldon ! That totally make sense, I'll really try to give this a try unless you have some spare time, I won't be able to tackle this super soon. :smiley: