gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.22k stars 10.32k forks source link

Preventing 301 redirects on URLs with no trailing slashes (Netlify) #9207

Closed lloydh closed 3 years ago

lloydh commented 5 years ago

Summary

URLs with no trailing slash on sites hosted by Netlify lead to an immediate 301 redirect to the page with a trailing slash.

foo.com/bar --> foo.com/bar/

This has a performance cost and implications for SEO.

Is there a Netlify configuration that resolves these URLs without redirecting?

Relevant information

While this question is specific to Netlify, I did a quick review of other Gatsby sites featured in the Showcase and saw the same behaviour in many, but not all cases, for example:

Hopper /company - 301 redirect (Netlify) Impossible Foods /mission - 301 redirect (unknown) Cajun Bow Fishing /bows - 301 redirect (Netllify) Braun /shavers-for-men - 200 no redirect (unknown)

Environment (if relevant)

Same behaviour in Gatsby v1 and v2. I'm using gatsby-plugin-remove-trailing-slashes and gatsby-plugin-netlify. Within the project all Links point to the non-trailing slash version.

Yurickh commented 5 years ago

I'm not really familiar on how Netlify runs the static site, but I know that this is the default behaviour for express when serving static folders. I'm not sure which are the performance costs you're implying here, can you link me to a reference? Would love to read more about that.

lloydh commented 5 years ago

The performance cost is the synchronous delay for the first byte of useful data caused by the redirect.

Using the Hopper example above, visiting https://www.hopper.com/company takes 150-300ms for the redirect, before any page data is received. On cellular connections with high latency it can add up to 1s.

Yurickh commented 5 years ago

I see.

From my experience with static sites, we always redirect to the /-ending url whenever it represents a folder (with an implicit index.html), not a file (like shavers-for-men.html). If you don't, the static resolution will simply fail.

This is also what the "pretty url" option on netlify does.

I guess the best approach here is to use the /-ending url as the canonical one, unless you have really solid reasons to do otherwise.

EDIT: Note that my position here is nowhere near an official position from the gatsby team. This is solely my personal opinion on the subject.

lloydh commented 5 years ago

@Yurickh I agree specifying the /-ending urls as canonical is a pragmatic option but it does seem like advocating a schizophrenic url scheme for something that was easily solved with nginx / apache but not today's popular static site hosts.

If this really is the best approach then perhaps gatsby-plugin-remove-trailing-slashes should have this as an option (at least mentioned in the docs). Alternatively gatsby-plugin-canonical-urls could have an option for trailing slashes. I haven't found any other plugins that can set canonical meta tags.

I did come across #9025 but to be honest I'm surprised this hasn't been a bigger issue for a lot of folks given the popularity of /-less Gatsby sites.

kakadiadarpan commented 5 years ago

@lloydh did you had a chance to look at this documentation of Redirects|Netlify?

luukdv commented 5 years ago

Same issue here. Turned on 'Pretty URLs' in Netlify, but when I visit a page and remove the trailing slash in the address bar afterwards, I land on the non-trailing variant.

Maybe there should be an option (or by default?) in gatsby-plugin-canonical-urls to enforce a trailing slash. Right now it's not really a canonical, since the current pathname is used (https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-plugin-canonical-urls/src/gatsby-browser.js#L9). @kakadiadarpan what do you think?

Yurickh commented 5 years ago

Just to clarify, the 'Pretty URLs' option in netlify WILL redirect you to the trailing slash variant:

In addition to forwarding paths like /about to /about/ (a common practice in static sites and single page apps), it will also rewrite paths like /about.html to /about/.

denull0 commented 5 years ago

We are haveing exactly the same problem. And we are now getting penelties from google. Is there a solution?

lloydh commented 5 years ago

Just to clarify, the 'Pretty URLs' option in netlify WILL redirect you to the trailing slash variant

This is true but I haven't noticed any difference in behaviour with or without "Pretty URLs"; /foo redirects to /foo/ in either case. The desired behaviour is to rewrite slashless urls to resolve without redirecting.

Making /foo/ canonical is a workaround for the SEO penalties but AFAIK a full solution would only be possible in Netlify's routing layer… or by abandoning Netlify in favour of custom url rewriting …or by adopting slash/ urls. It's an unfortunate situation.

gatsbot[bot] commented 5 years ago

Old issues will be closed after 30 days of inactivity. This issue has been quiet for 20 days and is being marked as stale. Reply here or add the label "not stale" to keep this issue open!

abohannon commented 5 years ago

I'm not having this issue with Netlify specifically, but my UTM query params are being deleted for this same reason. /mypage?utm_source=google becomes /mypage/ and this is causing tracking issues.

gatsbot[bot] commented 5 years ago

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

Thanks again for being part of the Gatsby community!

dja commented 5 years ago

This is definitely still an issue, and we’ve experienced SEO penalties as well. Additionally, turning off Netlify’s pretty URLs feature seems to result in errors stating “Missing resources for /“ or “Missing resources for /slash/“. We’ve tried solutions recommended here: https://github.com/gatsbyjs/gatsby/issues/11524 but haven’t had any luck.

0505gonzalez commented 5 years ago

Experiencing this issue as well.

0505gonzalez commented 5 years ago

@dja Are you also on Netlify or are you using Github pages?

dja commented 5 years ago

We’re on Netlify.

Sent via Superhuman iOS ( https://sprh.mn/?vip=me@danieljacobarcher.com )

On Tue, Mar 5 2019 at 5:37 PM, < notifications@github.com > wrote:

@dja ( https://github.com/dja ) Are you also on Netlify or are you using Github pages?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/gatsbyjs/gatsby/issues/9207#issuecomment-469917416 ) , or mute the thread ( https://github.com/notifications/unsubscribe-auth/ABs0bVM7YXJ5ofuya2GghU6qBywzno8tks5vTw3hgaJpZM4XtQMs ).

0505gonzalez commented 5 years ago

@dja I'm on github pages, but the issue you're facing might be the same as mine. I've opened a new ticket and plan to create a PR shortly: https://github.com/gatsbyjs/gatsby/issues/12364

0505gonzalez commented 5 years ago

TLDR: Github pages (and probably Netlify) add trailing forward slashes to folders. If you have something like /public/somepage/index.html and you visit https://yourpage.com/somepage, Github (and probably Netlify) will add a trailing slash because somepage is a directory.

dja commented 5 years ago

How do we have Gatsby create pages like /public/somepage.html instead of /public/somepage/index.html so that pages aren't within a namespaced directory?

Sent via Superhuman ( https://sprh.mn/?vip=me@danieljacobarcher.com )

On Wed, Mar 06, 2019 at 2:21 PM, Juan Gonzalez < notifications@github.com > wrote:

TLDR: Github pages (and probably Netlify) add trailing forward slashes to folders. If you have something like /public/somepage/index.html and you visit https:/ / yourpage. com/ somepage ( https://yourpage.com/somepage ) , Github (and probably Netlify) will add a trailing slash because somepage is a directory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/gatsbyjs/gatsby/issues/9207#issuecomment-470281319 ) , or mute the thread ( https://github.com/notifications/unsubscribe-auth/ABs0bWzIZYfpr7g9X_josAtB66TYA6zuks5vUDE-gaJpZM4XtQMs ).

0505gonzalez commented 5 years ago

@dja That's what I'm currently trying to figure out in the other github issue I opened

himynameistimli commented 5 years ago

@0505gonzalez were you able to figure out a solution for this issue?

Just landed here as we're having the same problems with the url parameters getting lost in the redirect from the version without the trailing slash to the version with the trailing slash.

Only difference is that we're on S3 + Cloudfront.

We might look into using Lambda@Edge to handle the redirect unless we can figure out a way to get it to work with gatsby.

Update:

For our case, we implemented the fix from https://www.ximedes.com/2018-04-23/deploying-gatsby-on-s3-and-cloudfront/ with the following lambda js function:

const querystring = require('querystring');
exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    /* Parse request query string to get javascript object */
    const params = querystring.parse(request.querystring.toLowerCase());
    const sortedParams = {};
    const uri = request.uri;

    /* Sort param keys */
    Object.keys(params).sort().forEach(key => {
        sortedParams[key] = params[key];
    });

    /* Simple way return the index.html */
    if (uri.endsWith('/')) {
        request.uri += 'index.html';
    } else if (!uri.includes('.')) {
        request.uri += '/index.html';
    }

    /* Update request querystring with normalized  */
    request.querystring = querystring.stringify(sortedParams);

    callback(null, request);
};

I'm still trying to figure out what's the best thing to do here, because I think as-is, there's a negative impact to our SEO just because now we're delivering the same page for the trailing and non-trailing slash version. Likely I will add a permanent redirect for the trailing slash version here too.

If you're not using AWS/Cloudfront, I think you'd be able to accomplish this with Cloudflare Workers.

0505gonzalez commented 5 years ago

@himynameistimli I did find a solution, proposed a code change in another thread. But seems like it might not be accepted so have not created a PR.

The gist of it:

garethgd commented 5 years ago

Still experiencing this trailing slash redirect with or without Netlify's pretty URL option enabled.

decimoseptimo commented 5 years ago

@KyleAMathews

@himynameistimli I did find a solution, proposed a code change in another thread. But seems like it might not be accepted so have not created a PR.

The gist of it:

  • Hosting on github pages. Github follows the directory structure when serving files. E.g. if you hit /somepage, it will redirect to /somepage/ because it's a directory (actual file is /somepage/index.html.
  • My proposed solution was that gatsby generate /somepage.html instead of /somepage/index.html

@KyleAMathews this is seo danger This subject isn't explained at all in https://www.gatsbyjs.org/docs/gatsby-link/ If someones decides to use no-trailing-slash-urls, a couple of days later the google serp becomes full of 301 redirections for your site.

gatsbot[bot] commented 5 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contributefor more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

anhnk commented 5 years ago

We are having the same problems as described in this PR. I'm just wondering if it is being actively worked on at the moment.

wardpeet commented 5 years ago

Not at this moment but we're open for Pull requests.

statico commented 5 years ago

I've just started evaluating Gatsby. Is there a way that, say for a given path /foo, have Gatsby generate public/foo.html instead of public/foo/index.html?

decimoseptimo commented 5 years ago

Not at the moment. The closest would be name your pages like page.html.js. My advice to you would be to stick with trailing slashes. For non-trailing slashes to work you'll need to do some server-side url rewriting.

pewStocky commented 5 years ago

Hi there, this would be great to have looked at as can cause issues SEO wise based on current builds etc.

One of the main things SEOs will look at when their internal dev team is wanting to use a new approach like Gatsby are things like these.

Seems a bit weird that it forces / overwrites server side rules on trailing slashes, not sure how much work this will be but would help a lot! Thanks

esetnik commented 5 years ago

I just got stuck on the same issue. Is there any advantage to the current gatsby approach of using a directory structure for pages? Why can't we just render them all at the root as suggested by @statico?

kimbaudi commented 5 years ago

Netlify has pretty good documentation on handling custom redirects/rewrites. For anyone interested in rewriting a path to prevent 301 redirects, take a look at HTTP Status Codes - Redirect & Rewrite Rules

You should be able to create a netlify.toml file to specify your rewrite rules:

sample netlify.toml with rewrite rule

# Rewrite a path
/foo    public/foo.html    200
/bar/   /bar               200

I haven't tried this because I want "pretty URLs" that end with a trailing slash. The only downside to this approach is that it is very tedious and error prone to specify all paths in your website to be rewritten without trailing slash :disappointed:

Also, this issue seems to be related to https://github.com/gatsbyjs/gatsby/issues/15317

esetnik commented 5 years ago

Would it be as easy as modifying https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/worker/render-html.js#L10

kimbaudi commented 5 years ago

I don't know about modifying https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/worker/render-html.js#L10.

I just wanted to mention that there is a way to do it in Netlify using netlify.toml

wardpeet commented 5 years ago

Is it possible to add a reproduction repo that we can build on netlify or wherever to mimic this problem and get it resolved

bijenkorf-james-wakefield commented 5 years ago

I don't think that this issue is related to Netlify or Github pages. I can reproduce it locally running gatsby serve, which appears to be expected behaviour Gatsby's redirect handling.

@lloydh / @luukdv - can you reproduce locally?

dougwithseismic commented 5 years ago

This is a important issue for us, too. You can see this behaviour with https://gatsbyjs.github.io/gatsby-starter-default if you're still looking for a reproduction repo @wardpeet

The importance of this small detail can be found here .. https://www.seroundtable.com/google-trailing-slashes-url-24943.html

Here's an example - You run foo.com and you're looking to Gatsby for a rebuild. Your site is live and serves 30k users a month, who frequent your blog content on links such as foo.com/blog/interesting-post -

When rebuilding, you realise that your blog posts are actually found at foo.com/blog/interesting-post/ (with a trailing slash.) Essentially you're creating a duplicate of every page you have, as non-trailing and trailing are not the same in the eyes of crawlers.

If your site has been using one method, but is now forced to use another then do you take that risk? Sure, the canonical tag exists - but triage and bandaids are not best practice.

For us, it's a no.

Can I do anything to help?

pewStocky commented 5 years ago

Hey doug, i'm currently looking into this too with my dev team (i'm in-house SEO) - my post here: https://github.com/gatsbyjs/gatsby/issues/15317

Will update you / this thread if we come up with any solutions etc - cheers

jaapstronks commented 5 years ago

Just wanted to reply that we are looking forward to this bug being resolved. We are using urls without trailing slashes btw, we have the issue that Twitter/Facebook will not show the thumbnail we set, because the url is redirected to the version with a trailing slash. (Twitter Card Validator will return: WARN: this card is redirected to https://foo.com/bar/ when submitting https://foo.com/bar). Twitter will only pick up the thumbnail after manually using Twitter Card Validator to enter the url, and it will forget it after a week (Twitter re-indexes card content every week or so, and it won't index the urls that are hidden behind a redirect).

dougwithseismic commented 5 years ago

@jaapstronks Are you by any chance running with netlify? Managed to fix this issue here https://github.com/gatsbyjs/gatsby/issues/15317#issuecomment-530048373

PolGuixe commented 4 years ago

Can someone summarise the current situation?

Solution: still WIP.

Hack:

  1. Ensure trailing slashes are used in all routes
  2. Add a trailing slash to the canonical link

Please feel free to correct me and I'll amend/delete the above.

brettmstevens7 commented 4 years ago

If you're using Gatsby + Netlify, here's how we solved it.

First, any pages in the /pages directory, which are created automatically by Gatsby, will have a trailing slash. You can see this behavior by entering a URL without a trailing slash directly into the search bar. The network tab will show a 301 as the page is redirected to the one with the trailing slash.

Try running this GraphQL query and you will see that the path for these pages has a trailing slash.

query MyQuery {
  allSitePage {
    edges {
      node {
        path
      }
    }
  }
}

To remove the trailing slash from the page path, you can add this code snippet from the Gatsby docs to your gatsby-node.js file:

// Replacing '/' would result in empty string which is invalid
const replacePath = path => (path === `/` ? path : path.replace(/\/$/, ``))
// Implement the Gatsby API “onCreatePage”. This is
// called after every page is created.
exports.onCreatePage = ({ page, actions }) => {
  const { createPage, deletePage } = actions

  const oldPage = Object.assign({}, page)
  // Remove trailing slash unless page is /
  page.path = replacePath(page.path)
  if (page.path !== oldPage.path) {
    // Replace new page with old page
    deletePage(oldPage)
    createPage(page)
  }
}

Next, you need to turn off the pretty URL setting in Netlify. If you have Asset optimization disabled, enable it, turn off pretty URLs, and then disable it again.

netlify-setting

Push a new build and then observe the network requests for a page that had a trailing slash. For instance, after typing https://www.software.com/code-time then pressing Enter, the first request is a 200 (there isn't a 301 like before):

200-resp

If anyone has any better solutions, let me know! We haven't solved the trailing slash on localhost.

AAverin commented 4 years ago

@wardpeet @KyleAMathews Would like to bring your attention to this issue again because it's a major problem for SEO, as many people in the tread already mentioned. Asking for PRs is great, but this particular one looks to be related to some of the core functionalities of the Gatsby engine, so I myself wouldn't take responsibility touching it, not knowing all ins and outs of Gatsby.

Could you or someone with deep knowledge take a look and offer some kind of solution?

Also, issue is not related to Netlify or Github pages. It can be reproduced locally, I have it on Firebase Hosting for production.

Additionally, there is already a PR from @cosmn. Maybe it already solves the issue or can be used as an idea on what can be done?

lawwantsin commented 4 years ago

This is a problem in AWS as well. Gatsby is a scam. my solution is sadly, to append a forward slash everywhere. oneliner: if (to[to.length - 1] != "/") to += "/" if to is the link variable.

dougwithseismic commented 4 years ago

Guys, it's a setting within Netlify as described above. Perhaps a new issue needs to be created.

On Tue, 23 Jun 2020, 16:29 Lawrence Whiteside, notifications@github.com wrote:

This is a problem in AWS as well. Gatsby is a scam.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gatsbyjs/gatsby/issues/9207#issuecomment-648195328, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALSIKRE6IUIRXBXD4PVTCYLRYC35TANCNFSM4F5VAMWA .

dospolov commented 4 years ago

Two years later - still an issue

stldo commented 4 years ago

I made a plugin from a code that I normally use in websites hosted at Netlify. It creates a .html file for each page, which disables the 301 redirect for paths without trailing slashes. It works very well with simple websites, I hope it helps someone. More info can be found on the plugin page.

ayZagen commented 4 years ago

this also happens with nginx server.

EDIT: this issue has nothing with gatsby. it is web server misconfiguration. I have checked output files and it seems gatsby creates a directory for each page and and index.html in it. So I had to change my nginx url resolving as following:

location / {
  try_files $uri $uri/index.html $uri.html =404;
}

The $uri/index.html is resolving correct file without redirect. If that doesn't exists or it is $uri/ ( most nginx conf examples uses that ) it will create a redirect with trailing slash. It is also stated in nginx documantation.

In response to a request with URI equal to this string, but without the trailing slash, a permanent redirect with the code 301 will be returned to the requested URI with the slash appended.

I have not used Netlify but I believe same thing would apply to it.

lyxious commented 4 years ago

this also happens with nginx server.

EDIT: this issue has nothing with gatsby. it is web server misconfiguration. I have checked output files and it seems gatsby creates a directory for each page and and index.html in it. So I had to change my nginx url resolving as following:

location / {
  try_files $uri $uri/index.html $uri.html =404;
}

The $uri/index.html is resolving correct file without redirect. If that doesn't exists or it is $uri/ ( most nginx conf examples uses that ) it will create a redirect with trailing slash. It is also stated in nginx documantation.

In response to a request with URI equal to this string, but without the trailing slash, a permanent redirect with the code 301 will be returned to the requested URI with the slash appended.

I have not used Netlify but I believe same thing would apply to it.

This is not the case, even if you have a blank nginx config, any attempt to access a valid directory will result in a 301 with the trailing slash. It's a common misconception that it is related to the try_files. The documentation is also poor surrounding this, the location documentation makes it seem like the 301 redirect is only for routes that are proxied.

atkinson commented 4 years ago

This is a bug in the Netlify UI.

Here's a fix: https://community.netlify.com/t/remove-trailing-slash-redirect-for-gatsby-gatsby-cloud-netlify-website/20976/8