datascijedi / website

https://datascijedi.org
Creative Commons Attribution Share Alike 4.0 International
5 stars 2 forks source link

Canonical links for search engine indexing #57

Closed ravicodelabs closed 11 months ago

ravicodelabs commented 1 year ago

The motivation for this issue is to implement code to encourage search engines to index the website (once it's live at datascijedi.org) in the way we want. For instance, to not use the www subdomain, and to help avoid the datascijedi.netlify.app links from getting indexed. As a specific example, we would want the about web page to be indexed as https://datascijedi.org/about.html, not as https://datascijedi.netlify.app/about.html.

We already have the sitemap.xml, so that helps for indexing purposes. However the Google developer docs recommend that using link tags to specify the canonical link is the better approach. For instance, the about page should have the following in its header:

<link rel="canonical" href="https://datascijedi.org/about.html" />

The problem with the above link tag approach, however, is that it requires editing each page manually. Ideally Quarto would handle this for us. In the mean time, below are two approaches that might be worth a shot for automated insertion.

  1. Use a post-render script to insert the link tag into each HTML file. The sitemap.xml file could be used to find and loop through all the HTML files.
  2. Use a pre-render script to insert the link tags in each qmd file via the Quarto header-includes option. The header-includes (I think) takes whatever value is provided and includes it in the header of the rendered HTML file and is documented here. This approach would require doing a file search to find all the qmd files, so the other approach is probably better.

There is also a related discussion on Quarto's GitHub discussions.

ravicodelabs commented 11 months ago

Google is now indexing datascijedi.org the way we want, so there isn't a need for this anymore.