ONEARMY / community-platform

A platform to build useful communities that aim to tackle global problems
https://platform.onearmy.earth
MIT License
1.1k stars 370 forks source link

Create a dynamic `sitemap.xml` #886

Closed drydenwilliams closed 2 years ago

drydenwilliams commented 4 years ago

Is your feature request related to a problem? Please describe. We need to have a sitemap.xml to list all the pages we have for nice Google Bots.

Describe the solution you'd like A solution I've done in the past is to do this on the API by adding a for /sitemap.xml and dynamically making the .xml file..

Here is an example of a sitemap.xml and you can see how it structures it which is very uniform:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>
https://monzo.com/blog/2018/06/19/gambling-block-self-exclusion/
</loc>
<lastmod>2019-12-11</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>
https://monzo.com/blog/2019/02/06/zero-sum-budgeting/
</loc>
<lastmod>2019-12-11</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
</urlset>

How I did this before was by (this is using express):

app.get('/sitemap.xml', createSitemap);

Then in my case it was using the routes I had, and looping through all the ski-resorts that were returned in the DB. Then I took this list of ski resorts and made them into the same router structure on my page. E.g.

how-to called : run-a-workshop-on-an-event would become: https://community.preciousplastic.com/how-to/run-a-workshop-on-an-event

and here was my code for inspiration:

const sitemap = require('sitemap');
const localizer = require('../app/localizer');
const SkiResort = require('./models/ski-resort');
// List of all my routes
const xroutes = require('../../web-app/src/xroutes.js');
const { PROD_HOSTNAME } = require('../config')
const { slugify } = require('../utils')

async function createSitemap(req, res) {
  let urls = [];
  const skiResorts = await SkiResort.find({ hasProfile: true });

  urls = urls.concat([
    { url: xroutes.HomeRoute.path, lastmodISO: new Date().toISOString() },
    { url: xroutes.JobsRoute.path, lastmodISO: new Date().toISOString() },
  ]);

  // Get a list of countries
  const countriesArray = skiResorts.reduce((acc, resort) => {
    if (!acc.includes(resort.country)) {
      acc.push(resort.country)
    }
    return acc
  }, []);

  // Create countries
  urls = urls.concat(countriesArray.map((country) => {
    let countryNameLoweCase = slugify(localizer.countryCodeMap.EN[country])
    const url = xroutes.ResortsRoute.path
      .replace(':country', countryNameLoweCase)
    return { url, lastmodISO: new Date().toISOString() };
  }));

  // Create Resorts
  urls = urls.concat(skiResorts.map((resort) => {
    let country = slugify(localizer.countryCodeMap.EN[resort.country])
    resort.country = country;
    const url = xroutes.ResortRoute.path
      .replace(':country', resort.country)
      .replace(':resort', resort.slug);

    return { url, lastmodISO: new Date().toISOString() };
  }));

  const updatedSitemap = sitemap.createSitemap({
    hostname: PROD_HOSTNAME,
    cacheTime: 600000, // 600 sec cache period
    urls,
  });

  res.header('Content-Type', 'application/xml');
  res.send(updatedSitemap.toString());
}

module.exports = createSitemap;
drydenwilliams commented 4 years ago

@chrismclarke please could you just check my proposed implementation of this. And also It was mentioned that this might be a nice story for @alromh87?

chrismclarke commented 4 years ago

We will definitely want to dynamically generate an xml file. My 2 main concerns would be:

  1. How long do most searchbots wait to receive the sitemap? If interrogating the data on the fly it might take a few seconds to run all the required operations.

  2. How often do searchbots scrape? If it is frequent, or if there were a malicious 3rd party, it could be quite resource-intensive to go through the entire db every request.

It might be better to somehow schedule as part of a cron job (perhaps at same time as backup) instead of on request, and hosted as a static xml file in a public folder. The tricky part here however would be how to upload just an updated sitemap to firebase hosting... for that I'm not sure.