gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.21k stars 10.33k forks source link

Requests for page-data.json return 404 in client side routes #16097

Closed kirbycool closed 4 years ago

kirbycool commented 5 years ago

Description

Hey team! I've noticed an issue with client side routes recently.

When navigating to client side routes, Gatsby requests the associated page-data.json which returns a 404. For example, navigating to /app/profile loads /page-data/app/profile/page-data.json, which returns a 404 if the /app page has a matchPath of /app/*. This is reproducible in the simple-auth example which has a gatsby-node.js like this:

exports.onCreatePage = async ({ page, actions }) => {
  const { createPage } = actions

  // page.matchPath is a special key that's used for matching pages
  // only on the client.
  if (page.path.match(/^\/app/)) {
    page.matchPath = `/app/*`

    // Update the page.
    createPage(page)
  }
}

Note this doesn't seem to affect the rendering of the page. It looks like it's still getting the correct page data for the matchPath page even though Gatsby is making a request for different page data.

Steps to reproduce

You can reproduce this in the the simple-auth example.

  1. Clone the simple-auth example
  2. Install the latest gatsby
  3. gatsby build and gatsby serve
  4. Navigate to localhost:9000/app/login

You get some 404s for page-data image

Expected result

Gatsby shouldn't request page-data.json for client side routes.

Actual result

Gatsby requests page-data.json for client side routes and receives a 404.

Environment

System: OS: macOS 10.14.5 CPU: (8) x64 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Shell: 3.0.2 - /usr/local/bin/fish Binaries: Node: 12.3.1 - ~/.asdf/shims/node Yarn: 1.16.0 - ~/.asdf/shims/yarn npm: 6.9.0 - ~/.asdf/shims/npm Languages: Python: 2.7.15 - /Users/thanx/.asdf/shims/python Browsers: Chrome: 75.0.3770.142 Safari: 12.1.1 npmPackages: gatsby: ^2.13.41 => 2.13.41 gatsby-plugin-react-helmet: ^3.0.0 => 3.0.12

pieh commented 5 years ago

Trying to fetch page-data.json files is by design. Gatsby no longer have central manifest of available pages - if you are interested in why - please check https://www.gatsbyjs.org/blog/2019-06-12-performance-improvements-for-large-sites

Because Gatsby no longer have central pages manifest, it means that Gatsby will have to "blindly" check for metadata - which means there will be requests for metadata that end up being 404 in case of client side routes.

Does this cause issues other than just showing errors in console? I'm not sure if we can totally silence those errors and this is just what browsers will always do in case some request "fails" (because we do have handling on what to do if page-data.json is 404ing)

gunzip commented 5 years ago

same here. so, is it impossible to "blacklist" only those pages that usespage.matchPath (ake, client side only routes) to avoid requesting a 404 ? probably it has some drawbacks actually; ie. it causes more http requests (so more used bandwidth, client / server side) than necessary.

-- EDIT --

this is expecially true when you have a large list of gatsby <Link>s. (ie. content list with uuid) as there would be a 404 request for each rendered Link.

pieh commented 5 years ago

is it impossible to "blacklist" only those pages that usespage.matchPath (ake, client side only routes) to avoid requesting a 404 ?

@gunzip Yes, this actually would cause bugs in some cases: The way gatsby routing works, is that it prioritize stricter route matches. Consider example - we have /app/about regular static page and /app/* client side route.

If we would not "probe" for /app/about specific metadata - we would never be able to navigate to it, as /app/* also satisfies this pathname.

Now we might try to be smarter and during the build check if any other page would satisfy matchPath pattern - and only then try to get page-data, but this does not scale very well (has O(n^2) complexity).

I'm just not quite sure if those errors in console cause any actual issues? I.e. most adblockers will result in bunch of network errors in console, but this doesn't really affect users

kirbycool commented 5 years ago

@pieh Great thanks for the explanation! I haven't seen this cause any actual issues in the application, but thought it might not be intended. Feel free to close if there's no action items.

pieh commented 5 years ago

Unfortunately there's not much we can do about errors cluttering console. There is closed ticket on chromium bug tracker that ask for a way to not show those errors - https://bugs.chromium.org/p/chromium/issues/detail?id=124534 (this is about application code) - it started in 2012 and last response was 2 days ago, so it's closed but still kind of active heh.

You can locally opt into ignoring those errors (at least in chrome - dunno about other browsers) by using "Hide network" in dev tools console options:

eknowles commented 5 years ago

@kirbydcool I think this issue should be reopened, the 404 is an issue. For applications with large amounts of links per page (https://github.com/gatsbyjs/gatsby/issues/16097#issuecomment-515488361) it can cause 404s to go through the roof, if we know that the route will result in a 404 the application should not be making it in the first place, instead it should route to a 200.

My temp solution to this issue is to add 301 redirects for page-data.json

page-data/vehicle/([0-9]+)/page-data.json
--->
%{scheme}://%{host}/page-data/vehicle/*/page-data.json

This does result in the correct page-data for the dynamic route to be displayed. However a 301 still results in a network request for a new document each time.

e.g. page-data/vehicle/123456789/page-data.json

Gatsby should identify this route as being dynamic and the loader should fetch the */page-data.json.

The result would be:-

  1. Browsers would intelligently cache the page-data.json.
  2. Bursty requests would not fire for pages with n Link items.
  3. CDN logs would not be polluted with 100,000s of 404s

Is there someone on the gatsby core team who can prioritise this issue?

pieh commented 5 years ago

if we know that the route will result in a 404 the application should not be making it in the first place

Well, this is exactly the problem. For Gatsby to know that it would need to have centralized pages manifest - and this doesn't scale - see https://matthewphillips.info/programming/gatsby-pages-manifest.html as example of issues that centralized pages manifest cause

eknowles commented 5 years ago

@pieh, it just needs to know the dynamic routes, not every route right?

this would be the contents of ./cache/match-paths.json

e.g.

[
    {
        "path": "/product/*",
        "matchPath": "/product/*"
    },
    {
        "path": "/admin",
        "matchPath": "/admin/*"
    }
]
thebuilder commented 5 years ago

Would it be possible to extend the <Link> component with a clientSide or noPrefetch boolean prop that, prevents it loading the page-data.json? I guess it would still try to load it once the page is openend, but if we can can at least prevent the prefetch that would help.

gunzip commented 5 years ago

@thebuilder you can obtain the same effect using @reach/router link instead of gatsby link:

import { Link } from "@reach/router";

thebuilder commented 5 years ago

That's true - Just needs to remember to also transform the url using withPrefix.

scottyeck commented 5 years ago

FWIW, you can also see this in the client-only-paths example. Screenshot here:

localhost_8080_page_4

I too think that this should be reopened. FWIW we're experiencing it as well, and the rationale behind closing it was not "This is not an issue", but rather, "We can't fix this." I think I've got a fix, related entirely to the comment brought up by @eknowles . I'll have it up shortly.

wardpeet commented 5 years ago

Hey :wave:,

I do get your concerns but sadly we can't fix this. If we don't do the page-data check for the current page we'll bring back #15101.

Consider the following routes

[
  {
    "path": "/app/dashboard",
    "matchPath": "/app/*"
  },
  {
    "path": "/app/login",
  },
]

With that configuration, we won't be able to load /app/login as it's part of the matchPath logic. We will simply ignore the page and fallback to whatever is inside /app/*. The change to fix this behaviour was introduced in https://github.com/gatsbyjs/gatsby/pull/15762

A solution we could add is to save all paths that are inside a matchPath also into the matchPath.json so we know about all routes. This might bring back the page-manifest performance problem for site that heavily uses this behaviour.

Matchpath will look like

[
  {
    "path": "/app/dashboard",
    "matchPath": "/app/*"
  },
  {
    "path": "/app/login",
    "matchPath": "/app/login"
  },
]
scottyeck commented 5 years ago

@wardpeet Thanks for the follow up, and sorry for my sassy comment. Holy heck language looks different when I write late at night. Appreciate you not giving me a hard time there. 😇

Per the PR I issued, the test-bed pretty clearly prioritizes the feature of static-paths that live alongside client-only-paths, which aligns 100% with your explanation above. I can see how that'd be useful. I'm going to put additional thoughts on the fix on the PR thread so as to not litter the world.

wangyi7099 commented 5 years ago

I have a similar issue. I use createRedirect function in createPages hooks. I create redirects like the following:

{
    romPath: '/guide,
    redirectInBrowser: true,
    toPath: '/guide/introduction'
}

And when I view page and the /guide link exists on the page, the page-data/guide/page-data.json which not exists will still befetched and return an error of 404.

And also, if the server's bandwidth is bad, you cannot view any other pages until page-data.jsons of all links in current page to be fetched.

Just wonder if there is a change to ship a future: We don't have to load all page-data.jsons but load the page-data.json of the page we currently visit and we are ready to visit.

melpers commented 5 years ago

Does this cause issues other than just showing errors in console?

This seems to be breaking direct Links to images using the public URL. Steps to reproduce:

When you click the image to follow the link you get the public URL (i.e. http://localhost:9000/static/gatsby-astronaut-6d91c86c0fde632ba4cd01062fd9ccfa.png) but it results in the Not Found page and the console shows the page-data.json is returning the 404.

Reloading the error page results in the image (though you are no longer in React).

This same behavior happens if I import the Link from @reach/router.

Is this something withPrefix can address?

bsswartz commented 5 years ago

The 404 behavior seems to be required to allow static pages to live within the namespace of a client only route, which certainly might be useful.

For many use cases (ie. those which do not use the shadowing behavior at all) having the 404s is not a worthwhile compromise. Perhaps it would be possible to expose an API or configuration that would allow some match paths to have total control over their entire route/namespace?

colbyfayock commented 5 years ago

There are also a lot of concerns about this on #15398 as well.

While @pieh and @wardpeet i understand the gravity of returning to the page manifest, and i dont think anyone here is recommending you roll back a huge performance boost by doing so, i think discarding this as a non-issue is ill informed, and the idea that if it cant be done there, it can't be done anywhere, feels a little like you're saying it's not worth the time to try.

Having some kind of solution, whether it's purely in the client side routing plugin, or having a situation where Gatsby's Link has a way to prevent the lookup (similar to what someone here said) feels like at least a more thoughtful approach to helping people out rather than nothing at all.

This is clearly impacting people and really any amount of 404's for client side applications is NOT a normal side effect of an application. If Gatsby is coming at this as a "Won't Fix", you're basically confirming client side routing is NOT a 1st class citizen of the Gatsby framework, which is not how it appears your team is trying to position it as, at which point, why wouldn't someone just use Create React App or something of the like?

1 thing that IS a 1st class citizen for Gatsby is the developer and their experience. Hiding the 404s via the terminal filters isn't really a solution, as then you're masking logs that may actually be a problem. People with large applications with large amounts of dynamically generated client side routes (such as myself and a few others that prodded here and the other thread) are left to be swamped reviewing tons of 404s in the console to determine if there's an issue or to find the real ones amongst the swarm of 404s.

We all do appreciate you taking the time to think this through, but I would encourage you to not disregard this as a non-issue. As someone who advocates a lot for Gatsby, it's kind of disappointing seeing how this is being pushed off. Thanks for your time

kirbycool commented 5 years ago

I'll reopen since it seems like there's more discussion here.

senorgeno commented 5 years ago

How about something that creates an empty page-data.json file at build for client-side routes? You would need to define what client-side pages you have somewhere which adds some boilerplate however..

bsswartz commented 5 years ago

How about something that creates an empty page-data.json file at build for client-side routes? You would need to define what client-side pages you have somewhere which adds some boilerplate however..

Client only routes can contain a dynamic element that is not known at build time, like a product page that references an item id (ie. /widgets/12345/details/).

wardpeet commented 5 years ago

Sorry for the late notice, I have been on vacation for the past week and a half. I can bring you the good news that we will be fixing this natively in gatsby. We'll put urls inside match-path.json that match a dynamic route.

I'll be creating a PR to fix this issue. I'll let you know when it's up and running to get some feedback.

colbyfayock commented 5 years ago

thanks @wardpeet 🎉

josephkandi commented 5 years ago

+1 for the fix, having the same issue as well for an an app we are planning to go to production in a few weeks time. Was planning to abandon Gatbsy and try use CRA instead.

wardpeet commented 5 years ago

I've created a draft PR, just need to write a test and I'll publish a canary build so you can test it. To make sure it works for all your use cases.

josephkandi commented 5 years ago

I've created a draft PR, just need to write a test and I'll publish a canary build so you can test it. To make sure it works for all your use cases.

Awesome, i wish i could give you a lot of plus ones. My project had grounded because of this issue.

Made a video of my issue, https://slack-files.com/T03A23LJR-FN12WFD1S-5f3218378f

wardpeet commented 5 years ago

I've published a canary build gatsby@page-data-404. Please install it with npm install --save-dev gatsby@page-data-404 or yarn add --dev gatsby@page-data-404

I'm happy to get any problems

Made a video of my issue, slack-files.com/T03A23LJR-FN12WFD1S-5f3218378f

@josephkandi :scream: that's weird, so it's waiting for the page-data to complete?

hartantothio commented 5 years ago

I've created a draft PR, just need to write a test and I'll publish a canary build so you can test it. To make sure it works for all your use cases.

I did a quick test and it seems to work well. Thanks @wardpeet!

thovden commented 5 years ago

Looking good! Seems to work fine for my case (as described in #16491).

josephkandi commented 5 years ago

I've published a canary build gatsby@page-data-404. Please install it with npm install --save-dev gatsby@page-data-404 or yarn add --dev gatsby@page-data-404

I'm happy to get any problems

Made a video of my issue, slack-files.com/T03A23LJR-FN12WFD1S-5f3218378f

@josephkandi 😱 that's weird, so it's waiting for the page-data to complete?

@wardpeet Am still getting the 404 page shown but the pages load faster now than before.

colbyfayock commented 5 years ago

🎉 it seems to work for me! thanks @wardpeet

just a heads up, i am now however seeing this: image

that message seems to appear on /[page] trying to preload /page-data/[page]/page-data.json. if it helps, my client routes config has '/[page]/*'

it looks like there are many more preload tags: image

the bottom ones all look like /[page]/[page], though that may be a result of how i had to structure the pages to utilize the router?

previously there was 1 for /page:

<link rel="prefetch" href="/page-data/[page]/page-data.json">

and a bunch of

<link rel="prefetch" href="/page-data/[page]/[subpage]/Loading/page-data.json">

which maybe the new ones replaced? 🤷‍♂

im not nearly as worried about this one though, but thought id mention it.

bitttttten commented 5 years ago

@wardpeet I used your canary build on gatsby@page-data-404 and the slow page navigations definitely went down, although they were still slow on the first page load. I do have about 20 links to client-side only routes so maybe it's just about the amount of them?

I do also still get DOMException errors in the console:

chrome_2019-09-06_15-39-24

Nothing seems to have broken so far 🎉 ✨ Thanks!

thovden commented 5 years ago

I still have a problem for a dynamic app as described in #16491. It's not causing 404s anymore, but is still claiming Site has changed on server. Reloading browser for every route under /app/*. This may very well be solved by Cache-Control- ref https://github.com/jariz/gatsby-plugin-s3/commit/f2c93b6ce3e15f4b2d16a608fdb9e63c79044ad9 - will check.

Update 1: still having issues after Cache-Control update (and when running browser with no cache). No 404s, CloudFront delivers the exact same content for /app/foo and /app/bar and Gatsby still claiming Site has changed on server. Reloading browser Update 2: After invalidating the CloudFront cache (to force reload from S3) things are looking good.

tl;dr looking good so far :)

wardpeet commented 5 years ago

Is it possible to share a repo or host a gatsby build --no-uglify somewhere so I can try to debug it.

Thanks for the feedback!

josephkandi commented 5 years ago

Is it possible to share a repo or host a gatsby build --no-uglify somewhere so I can try to debug it.

Thanks for the feedback!

I have added you to the peruzal-learn repo. You will notice the issue when navigation from the sign-in page to the dashboard. Let me know if you need the test credentials for firebase.

derakhshanfar commented 4 years ago

@bitttttten same as me. any workaround?

wardpeet commented 4 years ago

You can currently use npm install --save-dev gatsby@page-data-404 as a workaround. I'll be finishing up this PR today.

wardpeet commented 4 years ago

PR got published in latest gatsby. I couldn't mimic prefetch behaviors as mentioned above. If you still have it, please open a new issue.

JustFly1984 commented 4 years ago

I was looking for solution to this issue, and trying to update gatsby to one of latest versions 2.15.23 and older leading to new bug https://github.com/gatsbyjs/gatsby/issues/17980

I was thinking that my bug could be related to this issue solution

remainstheday commented 4 years ago

@wardpeet I am still seeing this issue on gatsby version 2.13.45 here is my repo that is being deployed to github pages https://github.com/trentontri/trentontri.github.io and here is the route that fails to load for some reason http://trentonkennedy.com/projects

kilburn commented 4 years ago

@trentontri Your site doesn't work because this URL returns a 404 not found: http://trentonkennedy.com/page-data/projects/page-data.json. This URL returns a 404 not found because for some reason on your generated files the page-data/projects folder is page-data/Projects (with a capital P) instead. If you rename it everything should work again. As to how it got into that situation... I don't know.

jitendra-koodo commented 4 years ago

I'm also getting 404 for prefetch request /page-data//page-data.json . Not sure where is this double slash coming from, possibly something is missing between these 2 slashes.. gatsby-bug

mi-na-bot commented 4 years ago

@jitendra-koodo This is an unrelated problem and you should open a new issue for it. The entire path of a page should be there like /page-data/hello-world/page-data.json.

meronek commented 4 years ago

I'm running into this same problem also, and where it's affecting me is SEO and URL linting. As soon as a 404 is returned, SEO and URL linters just fail. A Lighthouse audit will demonstrate it, too.

This page here is pre-rendered: https://theboardr.netlify.com and everything's fine. Lighthouse audit works.

This page is client side only and returns a 404 but the page loads fine in the browser. URL linters and Lighthouse audits don't work. https://theboardr.netlify.com/events/3587/grind_for_life_series_presented_by_marinela_at_new_smyrna

Basically, I have a site I'm trying to build where most of it has data that doesn't change much, but some of it has data that changes all the time, like a list of upcoming events. So, I'm trying to make that part only dynamic, and avoid a sitewide rebuild every time a new event is added.

Am I going about this all wrong? Thanks for any help anyone can give.

connershoop commented 4 years ago

UPDATE:

Gatsby now passes route parameters directly to components. However, if rendering many dynamic pages (>1000) I would recommend using a different framework like Nextjs, as (correct me if I'm wrong) Gatsby is a static site generator that can only do SSR at build, where Nextjs can do SSR on request and build.

component-detail.tsx

import React, { FC, ReactElement } from 'react'

const ComponentDetail: FC = ({slug1, slug2, slug3}): ReactElement => (
      <div>{slug1} {slug2} {slug3} </div>
)
export default ComponentDetail

example of gatsby-node.js

  const { createPage } = actions

 // Only update the `/component-detail` page.
  if (page.path.match(/^\/component-detail/)) {
    // page.matchPath is a special key that's used for matching pages
    // with corresponding routes only on the client.
    page.matchPath = '/component/:slug1/:slug2/:slug3'
    // Update the page.
    createPage(page)
  }

OLD:

Hello, I was also having this problem with my Gatsby-React-Netlify application and I found a simple configuration solution, hopefully this works for some. I am using Reach Router to route clients to dynamic pages. I was creating the pages dynamically because I didn't want to render all of the pages at build (> 16000 pages), but I still wanted to leverage Gatsby's SSR on the site.
Apologies if this solution was already posted, but if anyone is still having trouble hopefully this helps them out!

example of dynamic routing component in /pages
component-detail.tsx

import React, { FC, ReactElement } from 'react'
import { Router } from '@reach/router'
import DetailComponent from '../components/detail-component/detail-component.tsx'

const ComponentDetail: FC = (): ReactElement => (
    <Router basepath='/component'>
      <DetailComponent path='/:slug1/:slug2/:slug3' />
    </Router>
)

export default ComponentDetail

example of gatsby-node.js

  const { createPage } = actions

 // Only update the `/component-detail` page.
  if (page.path.match(/^\/component-detail/)) {
    // page.matchPath is a special key that's used for matching pages
    // with corresponding routes only on the client.
    page.matchPath = '/component/:slug1/:slug2/:slug3'
    // Update the page.
    createPage(page)
  }

Now these dynamic routes do not 404 on production.
My problem was I was passing page.matchPath = '/component/*', but I had to specify exactly what my path in the router was, which in this case was 3 slugs.

cristianCeamatuAssist commented 3 years ago

Hello there! I am building a gatsby project with Cloudflare deployment, I faced the same page-data.json 404 issues, the client wanted a SEO score as close as possible to 100%, the 404s affect the Best Practices score. I managed to find a small hack using the Cloudflare [workers ](https://developers.cloudflare.com/workers/)index.jsfile (this is similar to the.htaccess` file in Apache but is just plain Javascript.

addEventListener('fetch', event => {
  try {
    event.respondWith(handleEvent(event))
  } catch (e) {
    if (DEBUG) {
      return event.respondWith(
        new Response(e.message || e.toString(), {
          status: 500,
        }),
      )
    }
    event.respondWith(new Response('Internal Error', { status: 500 }))
  }
})

async function handleEvent(event) {
  const url = new URL(event.request.url)
  let options = {}

  try {
    if (DEBUG) {
      options.cacheControl = {
        bypassCache: true,
      }
    }
    return await getAssetFromKV(event, options)
  } catch (e) {
    // if an error is thrown try to serve the asset at 404.html
    if (!DEBUG) {
      try {
        let notFoundResponse = await getAssetFromKV(event, {
          mapRequestToAsset: req => new Request(`${new URL(req.url).origin}/404.html`, req),
        })

         /**************************************************************************************************************
        /* HERE IS THE HACK, instead of returning a 404 from below I am returning a 200 plain text response here
        /**************************************************************************************************************/
        if (requestUrl.includes('page-data')) {
          return new Response('Bypassing gatsby page-data.json errors');
        }

        return new Response(notFoundResponse.body, { ...notFoundResponse, status: 404 })
      } catch (e) {}
    }

    return new Response(e.message || e.toString(), { status: 500 })
  }
}

Maybe you can reproduce the same in Apache's .htaccess or where you can manipulate the request/response cycle, the logic is: if a resource is not found and the URL request contains page-data.json return a 200 or other code instead of the default 404.

kara-ryli commented 3 years ago

FWIW, I still get this when using Gatsby's <Link /> to pages in my static folder. The workaround is to... not do that. :)

cristianCeamatu commented 3 years ago

FWIW, I still get this when using Gatsby's <Link /> to pages in my static folder. The workaround is to... not do that. :)

You are amazing! I will test this solution and revert.

cristianCeamatuAssist commented 3 years ago

FWIW, I still get this when using Gatsby's <Link /> to pages in my static folder. The workaround is to... not do that. :)

Replaced Link with regular anchor tags and everything works perfect, this also fixes the too many prefetch requests error that was showing on some SEO tools like dareboost

Floriferous commented 2 years ago

Hello @wardpeet, why is this closed? There are only workarounds here, the issue still happens regularly on our website. We're on gatsby 4.1, the fix that closed this issue was on 2.x.