gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.22k stars 10.32k forks source link

Gatsby does not resolve / find unicode URLs encoded with encodeURI #16765

Closed eyalroth closed 4 years ago

eyalroth commented 5 years ago

Description

Gatsby does not support pages with a path containing unicode characters and encoded with encodeURI. The development server (gatsby develop) will fail to find these pages, while the production build (gatsby build) will fail to find them if the service worker plugin (gatsby-plugin-offline) is enabled.

This was previously discussed in this issue, however it was closed by the Gatsby bot so I am reopening it, as it is a crucial bug for me.

Steps to reproduce

  1. Create a new gatsby project from the default starter.
  2. Add a new page component in src/components/page3.js:
    
    import React from "react"
    import { Link } from "gatsby"

import Layout from "./layout" import SEO from "./seo"

const ThirdPage = () => (

Hi from the third page

Welcome to page 3

Go back to the homepage

)

export default ThirdPage

3. Add this to `gatsby-node.js`:
```js
const path = require('path')

exports.createPages = ({ actions }) => {
    const { createPage } = actions
    createPage({
        path: encodeURI("/page-שלוש/"), // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })
}
  1. Run gatsby develop.
  2. Try navigating to the page via either http://localhost:8000/page-שלוש/ or http://localhost:8000/page-%D7%A9%D7%9C%D7%95%D7%A9/, and you'll see nothing comes up.
  3. Add gatsby-plugin-offline in gatsby-config.js (simply un-comment it).
  4. Run gatsby build && gatsby serve.
  5. Once again try navigating to the aformentioned URLs (port 9000), and again you'll see nothing comes up.

Expected result

Encoded URLs should be resolved and found correctly.

Actual result

URLs are not found.

Environment

gatsby info --clipboard:

  System:
    OS: Linux 4.4 Ubuntu 16.04.6 LTS (Xenial Xerus)
    CPU: (4) x64 Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    Shell: 4.3.48 - /bin/bash
  Binaries:
    Node: 10.16.0 - ~/n/bin/node
    npm: 6.10.3 - ~/n/bin/npm
  Languages:
    Python: 2.7.12 - /usr/bin/python
  npmPackages:
    gatsby: ^2.13.70 => 2.13.70
    gatsby-image: ^2.2.9 => 2.2.9
    gatsby-plugin-manifest: ^2.2.5 => 2.2.5
    gatsby-plugin-offline: ^2.2.6 => 2.2.6
    gatsby-plugin-react-helmet: ^3.1.3 => 3.1.3
    gatsby-plugin-sharp: ^2.2.12 => 2.2.12
    gatsby-source-filesystem: ^2.1.9 => 2.1.9
    gatsby-transformer-sharp: ^2.2.6 => 2.2.6
  npmGlobalPackages:
    gatsby-cli: 2.6.13

Note that this is a WSL installation on Windows 10.0.17134.799.

robinzimmer1989 commented 5 years ago

I'm experiencing a similar issue.

I'm querying data with Japanese and Chinese content from WordPress. The URL's mostly containing foreign characters so WP encodes them automatically so they look like this: http://localhost:8000/%E4%BC%9A%E7%A4%BE%E6%A6%82%…%B7%E3%83%BC%E3%83%9D%E3%83%AA%E3%82%B7%E3%83%BC/

When I run gatsby develop all pages return a 404 error except the ones without special characters (i.e. http://localhost:8000/posts).

So I've tried to decode the pathname before creating the page and it worked - the pages were loading fine.

createPage({ path: decodeURIComponent(pathname), component: template })

But for some reason, the Gatsby Link component stopped working: When I navigate via the menu to a different page I get a white screen (no console errors) and nothing happens. But when I refresh the page it's all working fine again. Also when I visit any page from the generic development 404 page it's working just fine.

wardpeet commented 5 years ago

Is it possible to give us access to a reproduction with a wordpress api?

https://github.com/gatsbyjs/gatsby/issues/15551 says it's working as expected.

eyalroth commented 5 years ago

@wardpeet This has nothing to do with WordPress or any other CMS. Check out the steps to reproduce in the issue description.

gatsbot[bot] commented 5 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

eyalroth commented 5 years ago

@gatsbybot @wardpeet Nope, not stale. Still happening, and the reproduction information is available in this ticket.

dsegovia90 commented 4 years ago

Hello! Not sure if at all helpful, but I'm experiencing the same issue, although weirdly only in production and with double quotes " (%22) in the url. Also, the link is accessible if it's not the first load (SSR).

If you try to go to this link directly (server side rendered), the site throws an error (index):28 Uncaught SyntaxError: Unexpected string. Inspecting the code, gatsby is adding a double quote in the window.pagePath="/instrumentos/Viola-Cremona-SVA-130-708834-15"-1" instead of url encoding it, or escaping it.

But if you go here first, and click on the first instrument, it will take you to the same url above but not throw the error.

Again, not sure if at all helpful, let me know if you need more information.

davegreig commented 4 years ago

@dsegovia90 I have a similar but distinct issue to what you're reporting. See https://github.com/gatsbyjs/gatsby/issues/17556

TL;DR: In my case, I can navigate directly to a page with non-encoded URLs except on MS Edge. On MS Edge, I have the same symptoms of your bug - the page renders for a second, but then goes white and the <div id="___gatsby"> is contentless

eyalroth commented 4 years ago

@roadwig I believe it's a problem with Edge, not Gatsby. The whole reason I first started encoding my URLs is because Edge was failing to load them.

Edit: Reading through your issue, perhaps this is indeed a problem with Gatsby. Hard to say if Gatsby is to blame or Edge. Regardless, this definitely seem related.

gatsbot[bot] commented 4 years ago

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.

Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community!

eyalroth commented 4 years ago

@gatsbybot @wardpeet Still not stale, still has all the required info, and things have definitely happened on this issue in the past 30 days 😕

btk commented 4 years ago

@eyalroth Your reproduction steps breaks.

But what is your actual reason using encodeURI()? I have been using createPage() with Turkish script characters, and haven't faced any issue.

Just tried as;

createPage({
        path: "/page-שלוש/", // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })

And this works perfect. Both staging and production.

On this system;

  System:
    OS: Windows 10
    CPU: (8) x64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  Binaries:
    Yarn: 1.16.0 - C:\Users\ileti\AppData\Roaming\npm\yarn.CMD
    npm: 6.9.0 - C:\Program Files\nodejs\npm.CMD
  Browsers:
    Edge: 44.17763.1.0
  npmPackages:
    gatsby: ^2.15.36 => 2.15.36 
    gatsby-image: ^2.2.27 => 2.2.27 
    gatsby-plugin-manifest: ^2.2.21 => 2.2.21 
    gatsby-plugin-offline: ^3.0.14 => 3.0.14 
    gatsby-plugin-react-helmet: ^3.1.11 => 3.1.11 
    gatsby-plugin-sharp: ^2.2.29 => 2.2.29 
    gatsby-source-filesystem: ^2.1.31 => 2.1.31 
    gatsby-transformer-sharp: ^2.2.21 => 2.2.21 
vincentpelage commented 4 years ago

Got same issue here: Our client (Gatsby / WordPressAPI hosted on Netlify) created an URL with accent on WordPress then decided to remove accent, created a new URL and ask to redirect for SEO purpose. We used in gatsby-node.js:

createRedirect({
    fromPath: "/réseaux-sociaux",
    toPath: "/reseaux-sociaux/",
    isPermanent: true,
  })

When we try to access "/réseaux-sociaux", a 404 is displayed few ms before being replaced by a blank page. We also notice that every page that contains unicode character doesn't display 404 but blank page.

We have built several website with Gatsby and did not face this issue few month ago. We tried to downgrade both gatsby and react version but it did not resolve anything.

We also tried this before we found out that this issue wasn't related to the redirects, It didn't work:

createRedirect({
    fromPath: encodeURI("/réseaux-sociaux"),
    toPath: "/reseaux-sociaux/",
    isPermanent: true,
  })
travis-r6s commented 4 years ago

I have this issue when using locales from Prismic - I have a slug with cyrillic characters, i.e. /bg/за-нас (shown as /bg/%D0%B7%D0%B0-%D0%BD%D0%B0%D1%81) and when navigating to that page, Gatsby says it cannot be found, even though it shows that exact page url below.

image

EDIT: I didn't read above efforts properly 🤦‍♂️ - I tried decodeURI(slug) and that seems to have fixed my issue.

eyalroth commented 4 years ago

@btk encodeURI() should no break the page. There shouldn't be a special reason to use it, as it is a standard JavaScript method. Moreover, creating a page with Unicode characters (such as Turkish script) but without this method will fail to load the page on MS Edge (see #17556).

siavashh commented 4 years ago

Nothing new on this?

adamgen commented 4 years ago

Nothing to do with Edge, it happens on Chrome.

adamgen commented 4 years ago

I made a test with both plain English characters and with Unicode characters, the English only characters work well, the Unicode characters don't work.

Source site http://www.wpexpert.co.il/%D7%91%D7%9C%D7%95%D7%92/

Reproduction repo: https://github.com/adamgen/gatsby-wp-unicode-error-poc

image

adamgen commented 4 years ago

Just making sure we're on the same page here - the long weird %d7%9.... is a 100% legit URL, you can see many of these in the example Source site I sent

adamgen commented 4 years ago

I found a local fix, but I really think it should be fixed on gatsby.

To make a long story short - you should use decodeURIComponent instead of encodeURI since you're naming a filename and not a URL path. Taking the example from above this should work:

const path = require('path')

exports.createPages = ({ actions }) => {
    const { createPage } = actions
    createPage({
        path: decodeURIComponent("/page-שלוש/"), // this is "three" in Hebrew
        component: path.resolve('./src/components/page3.js'),
    })
}

I think that gatsby should apply decodeURIComponent on file paths by itself to avoid similar issues.

github-actions[bot] commented 4 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

github-actions[bot] commented 4 years ago

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it. Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

jeffstahlnecker commented 4 years ago

Still an issue that should be fixed in Gatsby.

moreguppy commented 4 years ago

Finding a similar issue with meta tags that have links to URLs — Twitter is not parsing any html entities in URLs where & has become &amp; so it can't get the meta image.

djun-kim commented 4 years ago

Having the same issue here. This is a blocker for enabling multi-lingual sites.

creotip commented 4 years ago

Same problem here. The solution with decodeURIComponent not working for me

barbareshet commented 3 years ago

@adamgen solution for Hebrew is working well

marcus13371337 commented 3 years ago

Having the same issue!

machineghost commented 3 years ago

Still an issue; GitHub REALLY needs a way to better surface falsely-closed issues.

EDIT: At the bare minimum, even if absolutely no code fix is needed (which seems unlikely), a documentation update is needed. https://www.gatsbyjs.com/docs/reference/routing/creating-routes/ doesn't even menton URI encoding/decoding, let alone explain how Gatsby expects you to handle it.

EDIT 2: Similarly https://www.gatsbyjs.com/docs/reference/config-files/actions/#createPage makes no mention of encoding problems, and simply defines the path parameter as:

path string Any valid URL. Must start with a forward slash

Encoded URLs (which is to say encodeURIComponent-ed strings, eg. "a:b" => a%3Ab) certainly are "valid", but they break when you use them with createPage.

LekoArts commented 3 years ago

Following the paper trail of https://github.com/gatsbyjs/gatsby/issues/17556 or https://github.com/gatsbyjs/gatsby/issues/15551 some issues around MS Edge and reach/router were solved. However, this issue here is way to vague on what's actually the issue now (whether it's a specific browser issue, problem with encodeURI or decodeURIComponent) and "having the same issue" comments doesn't help in resolving it.

So please open one new bug report (and others can comment on it with a reproduction) where you give a reproduction and outline where the problem lies now.

seankovacs commented 3 years ago

While all of the above comments mention actual pages that have unicode characters, a similar and easily testable issue I'm running into is accessing a purposely invalid URL with a %23 in the path part of a URL. The 404 page tries to render (the title says 404), but the page is blank and the console dumps out the error mentioned above - Cannot read property 'page' of undefined.

Auspicus commented 2 years ago

@LekoArts

I believe that due to this call (see: find-path.js) decodeURIComponent, paths with pre-encoded unicode characters (ie. /%E2%80%9Chmmm-%E2%80%9D) cannot be used in the createPages API. However, passing regular unicode characters works fine.

I agree with @machineghost that this should be documented in the createPage API and Routing reference docs. Or the issue should be addressed so that both pre-encoded and regular unicode characters work the same. From my perspective, I was expecting both to just work.

Sorry to re-ignite an old issue but I think it should be addressed in some capacity. A minimal reproduction would be (with “ being a unicode character, however any unicode character should have this issue):

// works
createPage({
  path: `/“hmmm”`,
  component: require.resolve("./src/templates/some-template.js") /* not relevant */,
  context: {},
})

// doesn't work
createPage({
  path: encodeURI(`/“hmmm”`),
  component: require.resolve("./src/templates/some-template.js") /* not relevant */,
  context: {},
})

TLDR;

Auspicus commented 2 years ago

@machineghost The docs have been updated in those two sections to reflect this limitation. Cheers for helping me find those limitations. I spent A LOT of time wondering why those URLs were coming up as 404.