page names are valid routes & SEO'd as duplicate content

brandonmp commented 6 years ago

Page names are getting crawled by the Google bot in addition to proper route patterns.

I have a route config'd like this:

{
        name: 'mba-livewire-explorer',
        pattern: '/clear-admit/mba-livewire-explorer/'
    },

The pattern is intended to be the only URL (let's call this the 'good route'), and has a tag <link rel='canonical'... reflecting that. In other words, /mba-livewire-explorer shouldn't be a valid route (let's call it the 'bad route').

I launched a page last night, and today the Google SERP is showing the bad route as 'duplicate content'. Here's a gif of what I mean: dupe-routes

Since the page was only launched last night, and I've never linked to the bad route, I'm clueless on how Google found the bad route's pathname.

How do I prevent the bad route from being accessed? My first thought is just a redirect from bad -> good in getInitialProps, but doing this for every route seems too tedious. Besides: I think the bad route should 404, right?

Hopefully the canonical tag is preventing SEO penalties here, but having every page accessible via its page name seems like an unintended behavior I'd like to address.

fridays commented 6 years ago

Hi @brandonmp the package should only use the 'good route' when used with the Link component. Could you check the source code to see if the original route appears somewhere? Maybe this can help.

brandonmp commented 6 years ago

thanks @fridays

The question of whether or not the good route was in my src is tricky, b/c all of my route name props are the same as my page routes. I think it ended up that way b/c of some confusion when configuring it, but anyway all the Link components are consistent (i use Flow and prop-types to validate my links).

But, the issue you linked actually did the trick (though i think i'll set up a bunch of redirects since some of the bad routes got indexed)

It doesn't look like they've documented this feature yet in the docs, but the way to disable file system routing is to add this key/value to next.config.js:

useFileSystemPublicRoutes: false

Then, all paths to file names in /pages return 404 as expected

For my purposes this is all i need, but i will leave it open in case you decide that the observed behavior is a bug (that is, file names being served as valid paths)

fridays commented 6 years ago

Thanks, glad it worked! It would be great to have it documented in both next and next-routes readme, could you create pull requests for that?

fridays / next-routes

page names are valid routes & SEO'd as duplicate content #119