facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.5k stars 8.48k forks source link

SEO issue: do not use useLocation() to compute canonical urls #9170

Open slorber opened 1 year ago

slorber commented 1 year ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

The way we compute the canonical url today:

function useDefaultCanonicalUrl() {
  const {
    siteConfig: {url: siteUrl, baseUrl, trailingSlash},
  } = useDocusaurusContext();
  const {pathname} = useLocation();
  const canonicalPathname = applyTrailingSlash(useBaseUrl(pathname), {
    trailingSlash,
    baseUrl,
  });
  return siteUrl + canonicalPathname;
}

Using useLocation().pathname works in most cases but it is a bad idea because it is a dynamic value that depends on the current browser URL. This means the static canonical URL might be ok in the html files, but once React hydrates, the canonical URL is updated to something else that can depend on the browser URL.

Notably, if you use your CDN/reverse proxy to configure aliases, if a doc exists at /doc1 and you also make it available at /doc1alias, then if you go to /doc1alias and after React hydrates, the canonical URL will be /doc1alias instead of /doc1 (ie 2 canonical URLs for the same doc).

I'm not sure it's a big deal for SEO, considering crawlers probably try to extract the static canonical URL in the page which is correct before React hydration, but we should still rather try to find a solution.

Note doing such reverse proxy alias might be common, and we also discuss it as part of this issue as a good solution if you want to have docs version aliases: see also https://github.com/facebook/docusaurus/issues/9049

Similarly, hreflang values depend on useLocation and can be wrong on aliased documents.

Related to https://github.com/facebook/docusaurus/issues/9128

Reproducible demo

No response

Steps to reproduce

We don't have any doc alias in our prod website, but the 404 case is a great example.

Take a look at https://docusaurus.io/not/found/path

Expected behavior

The canonical url, hreflang and other metadata using pathname should always be the same before/after React hydration

Actual behavior

The values are different before/after hydration

Your environment

No response

Self-service

prathamVaidya commented 11 months ago

@slorber I want to work on the task but does front matter plugin support aliases?

slorber commented 10 months ago

@slorber I want to work on the task but does front matter plugin support aliases?

I have no idea what you mean or why you ask this question sorry

prathamVaidya commented 10 months ago

@slorber As per my understanding, the issue says to update logic where canonical url is set using useLocation because it change depending on page url after hydration. If we are using canonical url then I expect there can be multiple aliases for a url. For example for docs, if there is a doc at 'doc1' and it also has a alias named 'doc1alias' .

  1. So my first question I looked for the documentation but I can't find a feature through which I can add a slug or url alias in a document.
  2. If there is no way to add an alias yet, then how are there multiple URLs for a page. (Not Including /404 page)

I hope I didn't confuse you this time 😅

slorber commented 10 months ago

Sorry, my misunderstanding was you mentioning a "front matter plugin", which doesn't exist


It's explained in the issue

Notably, if you use your CDN/reverse proxy to configure aliases, if a doc exists at /doc1 and you also make it available at /doc1alias, then if you go to /doc1alias and after React hydrates, the canonical URL will be /doc1alias instead of /doc1 (ie 2 canonical URLs for the same doc).