gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.22k stars 10.32k forks source link

Preventing 301 redirects on URLs with no trailing slashes (Netlify) #9207

Closed lloydh closed 3 years ago

lloydh commented 5 years ago

Summary

URLs with no trailing slash on sites hosted by Netlify lead to an immediate 301 redirect to the page with a trailing slash.

foo.com/bar --> foo.com/bar/

This has a performance cost and implications for SEO.

Is there a Netlify configuration that resolves these URLs without redirecting?

Relevant information

While this question is specific to Netlify, I did a quick review of other Gatsby sites featured in the Showcase and saw the same behaviour in many, but not all cases, for example:

Hopper /company - 301 redirect (Netlify) Impossible Foods /mission - 301 redirect (unknown) Cajun Bow Fishing /bows - 301 redirect (Netllify) Braun /shavers-for-men - 200 no redirect (unknown)

Environment (if relevant)

Same behaviour in Gatsby v1 and v2. I'm using gatsby-plugin-remove-trailing-slashes and gatsby-plugin-netlify. Within the project all Links point to the non-trailing slash version.

ayZagen commented 4 years ago

This is not the case, even if you have a blank nginx config, any attempt to access a valid directory will result in a 301 with the trailing slash. It's a common misconception that it is related to the try_files. The documentation is also poor surrounding this, the location documentation makes it seem like the 301 redirect is only for routes that are proxied.

There is nothing said about try_files it is about how file resolving works in nginx. try_files is just a helper directive. That code is just an example to show my usage and solution. I agree with you about documentation being poor. The redirection to a directory with a slash will be performed by an undocumented module named ngx_http_static_module. If you want to disable that behaviour you need to compile nginx yourself. I believe no one here would try this hard for it.

lyxious commented 4 years ago

This is not the case, even if you have a blank nginx config, any attempt to access a valid directory will result in a 301 with the trailing slash. It's a common misconception that it is related to the try_files. The documentation is also poor surrounding this, the location documentation makes it seem like the 301 redirect is only for routes that are proxied.

There is nothing said about try_files it is about how file resolving works in nginx. try_files is just a helper directive. That code is just an example to show my usage and solution. I agree with you about documentation being poor. The redirection to a directory with a slash will be performed by an undocumented module named ngx_http_static_module. If you want to disable that behaviour you need to compile nginx yourself. I believe no one here would try this hard for it.

OP's question is about why foo.com/bar redirects to foo.com/bar/. Your initial answer suggests it has to do with the try_files config, but it doesn't. The documentation you provided also has nothing to do with why this request returns a 301. The location documentation, with respect to 301, is in regards to specific *_pass processes which is independant of try_files.

wardpeet commented 4 years ago

Hey, sorry I don't have an update right now but I'll read over this thread again and see if I can get some action items out of it and create a task list so y'all can help us fix these issues 🙏

mlenser commented 4 years ago

This is a bug in the Netlify UI.

Here's a fix: https://community.netlify.com/t/remove-trailing-slash-redirect-for-gatsby-gatsby-cloud-netlify-website/20976/8

This is indeed the case.

Here is how your netlify config should look like: image

Disabling optimization at the top level apparently turns on the pretty URLs, even though it visually looks like that isn't the case: image

So don't check the checkbox next to "Disable asset optimization"

flackjap commented 3 years ago

This is a bug in the Netlify UI. Here's a fix: https://community.netlify.com/t/remove-trailing-slash-redirect-for-gatsby-gatsby-cloud-netlify-website/20976/8

This is indeed the case.

Here is how your netlify config should look like: image

Disabling optimization at the top level apparently turns on the pretty URLs, even though it visually looks like that isn't the case: image

So don't check the checkbox next to "Disable asset optimization"

I've just lost 3 hours because of this.

BOLEST.

ghost commented 3 years ago

@jlengstorf Do you know if this is intended behaviour ?

Shanonbaker commented 3 years ago

slash

jlengstorf commented 3 years ago

@alvinometric I'm not sure — I've sent this over to our UI team for review. it does look like if this isn't a bug, it could do with some clarification

leomelzer commented 3 years ago

Thanks for all the helpful comments which lead us in the right direction.

If it still doesn't work after setting it to @mlenser's comment (https://github.com/gatsbyjs/gatsby/issues/9207#issuecomment-695913229), make sure to check your netlify.toml for pretty_urls.

Settings in the toml take precedence, see the docs: https://docs.netlify.com/configure-builds/file-based-configuration/#deploy-contexts

UI settings are overridden if a netlify.toml file is present in the root folder of the repo and there exists a setting for the same property/redirect/header in the toml file.

LekoArts commented 3 years ago

Since multiple people have reported this as a bug in Netlify's UI / the behavior being the result of a misconfigured hosting, I'll close this one here as resolved (and not an issue with Gatsby). Please follow the linked issues to see how/when it's resolved. Thanks for providing your context and solutions here for future Google users (hello 👋 ).

If you see this issue on another platform than Netlify, please create a new issue with a reproduction -- as this issue here is specific to Netlify, it's resolved.

jon-sully commented 3 years ago

Hey 👋🏻

I know this issue lapsed and got closed but I really think it's important to recap on a couple of things here. The impetus for this issue is that a) there's a disconnect between Gatsby's routing defaults and Netlify's routing configurations, and b) there are serious SEO penalties in play if a Gatsby site doesn't have the trailing-slash / no-trailing-slash issue solved, since a site serving the same content on both URLs (duplicate content) gets knocked on SEO pretty hard. Technically this isn't Gatsby's fault, but as it pertains to all Gatsby users hosting on Netlify, it does seem like a major issue.. or a major risk at the very least.

Solving this problem by disabling "Pretty URLs" in the (yes, awfully borked / painful UX'd) Netlify Asset Optimization panel can open your site up to the duplicate content issue since content may be available at both the un-slashed and the slashed version of your URL path. It's important too to note that if a Gatsby site is available on both /test and /test/ but 'fixes itself', you may just be seeing the Gatsby runtime adjust your address bar via the Browser History API - the super important part has nothing to do with what happens when Gatsby actually runs in the browser - it's the part where Netlify is serving the same content on multiple URLs - the slash and the non-slash paths.

This is fixable and there is a way to get everything working smoothly and on a unified path / slash structure, but it's not disabling 'Pretty URLs'. The tl;dr: is that Netlify really works best / has biases toward using the trailing slash, and unified content pathing on Netlify requires the trailing slash. I elaborated on this in another Gatsby GH thread here:

https://github.com/gatsbyjs/gatsby/discussions/27889#discussioncomment-254854

But I would definitely urge folks to carefully check (from a CLI HTTP tool preferably) which paths (slash and/or no-slash) are resolving to their content on their sites. If both the slash and no-slash paths are resolving to your content, your SEO will hurt for it.

Hope that helps 😕

yanneves commented 3 years ago

Given gatsbyjs.com itself resolves HTTP 200 with or without trailing slash (duplicate content), it seems this issue is a bit of an afterthought.

$ curl -I https://www.gatsbyjs.com/plugins/
HTTP/2 200
$ curl -I https://www.gatsbyjs.com/plugins
HTTP/2 200

I noticed this issue like others here when I migrated from WordPress to Gatsby, where the previous WordPress configuration stripped trailing slashes. The only way I can see to avoid duplicate content is to cave to the 301 redirects and introduce trailing slashes. Is the SEO penalty for redirecting existing pages like this still relevant? It could be this is a lingering idea in SEO that no longer matters. But otherwise it's a dangerous assumption in Gatsby.

This appears to be baked into the directory structure of the static generated site:

public/
├── index.html
├── some-other-page
│   └── index.html
└── some-page
    └── index.html

A browser would interpret that as example.com/some-page/ and it would be hacky to force it to remove the trailing slash. Generated content would need to respect a configuration option to instead output the following when we want to remove trailing slash:

public/
├── index.html
├── some-other-page.html
└── some-page.html

The above would then be interpreted by the browser as example.com/some-page.

Do we know if Gatsby core is strictly expecting a directory structure for pages? There may be assumptions elsewhere that these are always output as directories. If we can identify those assumptions (or the lack of) configuring a non-trailing slash output like above would solve this issue.

krzysieqq commented 3 years ago

Any updates?

MakowskiHubert commented 3 years ago

The same problem with the nginx server and solved by https://github.com/gatsbyjs/gatsby/issues/9207#issuecomment-681870472

hazem3500 commented 3 years ago

If you want your website to not have any trailing slash and also work with Netlify you can use the gatsby-plugin-netlify and in gatsby-node.js add

const replacePath = path => (path === `/` ? path : path.replace(/\/$/, ``))

exports.onCreatePage = ({ page, actions }) => {
  const { createRedirect } = actions
  if(!page.path.includes('.html') && page.path !== '/') {
    createRedirect({ fromPath: `${page.path}/`, toPath: page.path, isPermanent: true })
  }
}

this will redirect all trailing slash to non-slash paths with 301 status code. Note if you are also using createPages Gatsby node API you'll need to add it there also

exports.createPages = async ({ actions, graphql }) => {
    const { createPage, createRedirect } = actions
       // ...
       pages.forEach(page => {
          // ...
          createRedirect({ fromPath: `${page.path}/`, toPath: page.path, isPermanent: true })
       })
    })
}
WhiteHoodHacker commented 3 years ago

By disabling "Pretty URLs" in Netlify and ending up with duplicate content at trailing-slash URLs and non-trailing-slash URLs, wouldn't a simple fix be to add a <link> tag with the canonical URL using the preferred scheme? Plenty of sites serve duplicate content fine with this.

wojtekidd commented 2 years ago

I'm not having this issue with Netlify specifically, but my UTM query params are being deleted for this same reason. /mypage?utm_source=google becomes /mypage/ and this is causing tracking issues.

Is anyone else having this problem still? There has to be a solution for a self-hosted gatsby to somehow accept utm's? I tried to do it this way in gatsby-node.js:

createRedirect({ fromPath: '/banana', toPath: '/shop?utm_source=banana&utm_medium=podcast&utm_campaign=banana', isPermanent: true });