dvdzkwsk / react-redux-starter-kit

Get started with React, Redux, and React-Router.
MIT License
10.29k stars 2.2k forks source link

Site is Invisible to Google? #819

Closed sinned closed 8 years ago

sinned commented 8 years ago

Hey all,

I'm trying to understand how to make my site show up in Google search, and when using webmaster tools, it appears that my site appears blank to the Google Bot. For testing purposes, I deployed the existing version of this starter kit to http://react.dennisyang.com/

In build/webpack.config.js, I found that removing this from the config makes "Fetch as Google" show the site (but, oddly, the "what the public sees" is still blank):

      production: {
        presets: ['react-optimize']
      }

Has anyone had any luck getting a site made with this starter kit indexed on Google?

Here's a screenshot of "Fetch as Google" using the latest react-redux-starter-kit build: screen shot 2016-05-20 at 1 21 13 pm

And, here's a screenshot of "Fetch as Google" with 'react-optimize' disabled: screen shot 2016-05-20 at 1 21 44 pm

Any thoughts? This fix appeared to make the homepage of my site show up in Google, but any page using Redux to grab content was not rendered by Google.

Thanks!

dennis.

kenzik commented 8 years ago

@sinned - In order to allow any search engines to completely index your app, you'll need to implement some server-side rendering. This will turn your project into an isomorphic (or universal) app. There are numerous techniques for accomplishing this, but the gist of it is: spin up an express server; ensure your server is aware of your routes; use React's render (as html/string) to inject into your server-side template. This method will allow express to server up any entry point and data to the browser, after which react in the browser will take over. Here is a decent Smashing article on the topic.

You may also want to check out react-helmet. It will inject the necessary meta entries into your app's <head> for indexing by the engines. I'm not sure about support across all search engines, but I believe Google will index the results of react-helmet properly, and scrapers will make proper use of any opengraph tags from react-helmet. This may be all you need.

yantakus commented 8 years ago

Making an app isomorphic is absolutely not necessary for google to index it. Google executes javascript and then indexes the output.

kenzik commented 8 years ago

@web2style - Agree 100%. Google does try to index JS output. I was speaking more generally regarding best practices when dealing with such tech and crawlers in general. Thanks for the nudge back to the topic though, considering the OP was indeed referencing Google. I've added emphasis to my previous response for clarity.

anthonybrown commented 8 years ago

@dkenzik I agree, it's easy to render on server-side first with React so why not do it?

nicolasiensen commented 8 years ago

In my last experience with single page applications I had problems with the Facebook crawler, which doesn't execute your Javascript to collect the meta tags 😞

gabeweaver commented 8 years ago

I've been running into this issue the past few days and have gone back and forth.

Desired Outcome: Simple static site powered by the starter kit - something like a basic marketing site or docs similar to GatsbyJS.

Best time to use server-side rendering: If you have to hit any third-party services/APIs before the user interface can provide any value to the end user. this is an excellent explanation of why

Best time to use client-side rendering - When you don't have any sort of API or data that needs to be queried, rendering client side is extremely performant and immediately returns the necessary HTML and CSS for the first view.

I'm currently exploring using Surge.sh to serve static assets. It's really awesome for several reasons:

The only problem is...the project i'm working on needs to be SEO friendly. I've starting exploring react-render-webpack-plugin after reading a few articles that seem promising.

Depending on how my experiment goes, I do think it's worthwhile to figure out how to make the starter kit have a configuration option for client-side rendering (that is SEO friendly), server-side rendering (that is SEO friendly), or true SPA that doesn't care about SEO similar to how CSSModules is an option in the config...

Any other thoughts on how to solve for this?

gabeweaver commented 8 years ago

oh and if i didn't mention it...Google won't crawl and/or render the webpack bundles as they are currently configured by default in the starter kit.

ralyodio commented 8 years ago

Forget about SPA and indexing.

sauravskumar commented 8 years ago

@gabeweaver Did you try phantomjs for server side rendering for bots and SPAs for normal clients?

trungpham commented 8 years ago

We can support server side rendering using this library. https://github.com/makeomatic/redux-connect

amrit92 commented 8 years ago

I am facing the same issue and looking for a solution. Can anyone point to solution for SSR using this starter-kit specifically? @trungpham Can you show an integration?

yantakus commented 8 years ago

I'm using v2.0 of this starter kit. It doesn't use code splitting, so this could be the point. But I don't have any problems with indexing by google. I generate sitemap.xml and all the pages are indexed without any problems.

ghost commented 8 years ago

As others have pointed out the Goog will crawl an Ajax site. Bing will do so as well. But those aren't the only two search engines out there. Due to time constraints crawlers allocate to index sites, Ajax sites will be crawled more slowly. Probably not a huge deal for most sites unless you have a lot of pages changing very often.

The term "Universal" is not interchangeable with "isomorphic", as Universal apps like Este.js are anything but SEO friendly.

Been meaning to write on this topic for years, but hopefully my talk on Isomorphic React (including example app) will prove useful to some:

http://habd.as/talks/isomorphic-rendering-react/

dvdzkwsk commented 8 years ago

Out of scope at the moment, we do not support universal rendering so this is not much of a concern. Closing to cleanup issues.

nhagen commented 7 years ago

I'm not sure I'm seeing any smoking guns here in terms of why google is unable to render pages with this starter kit. React is capable of supporting Googlebot rendering. Rather than discuss server-side rendering, how can we isolate what is preventing from googlebot rendering the page? A good question to ask is has anyone using this starter kit having their site indexed/rendered?

ghost commented 7 years ago

Google and Bing have been Ajax crawling since at least 2012. Just look for Matt Cutts videos from around 2011, and Bing actually published the feature when announced in '12. Pages in a SPA will look like a black hole to all non Ajax crawlers (e.g. less sophisticated scrapers) and non-JS browsers such as elinks and lynx. Even though SPAs are crawled some may still experience issues using Google Search Console tools to, for example, test schema.org stuffs.

That said if you're building an app build an app. If you're building a website go static or isomorphic for best SEO and accessibility.

nhagen commented 7 years ago

In our case we have several dependencies which aren't isomorphic because they're either wrappers around non-react components, or the maintainers just never considered running in an server environment (and so just importing them throws document is not defined). Moreover, changing from nginx to node is not ideal. So there are at least a few reasons why server-side rendering might not be worth the trouble, especially where the benefits we seek should be attainable without it.

I only care about google indexing for now, and I'm hoping to get the conversation started on what specifically is preventing that if it's something that is a common problem for users of this repo.

ReLrO commented 7 years ago

@nhagen Please let me know if you find a solution. I'll do the same.

ReLrO commented 7 years ago

@gabeweaver Do you have an example of a webpack config file that will enable Google crawling?

ghost commented 7 years ago

@ReLrO https://github.com/jhabdas/lumpenradio-com/blob/master/tools/webpack.config.js.

It's not so much the Webpack config as it is the architectural approach. That is isomorphic. As I mentioned earlier, Google, and Bing, will crawl non-Isomorphic apps (those with which the content is injected by JavaScript).

ReLrO commented 7 years ago

Thanks @jhabdas. I am trying to see if I can avoid the isomorphic approach. I deployed my webapp to AWS S3 and AWS Cloudfront as a static website and I am trying to get Google to see it. I read that Google can see react generated sites (that use async calls), but when using the starter kit, Google only sees a blank page. As people here mentioned, it seems like it is something to do with the configuration of the starter kit. Might be something to do with the Hot Module Replacement approach. I am not sure...

ghost commented 7 years ago

@ReLrO the black hole has nothing to do with this starter kit, and everything to do with JavaScript.

ReLrO commented 7 years ago

@jhabdas what do you mean?

ghost commented 7 years ago

@ReLrO I've been building SPAs since Backbone was introduced, and here's what a backbone website looked like: https://speakerdeck.com/jhabdas/isomorphic-rendering-with-react?slide=6

ReLrO commented 7 years ago

I understand, but it was also my understanding that Google can now crawl such websites (for example - http://chrisarasin.com/react-seo/) but that doesn't work for me...

ghost commented 7 years ago

@ReLrO have you registered your site with the google search console and added a page to your sitemap.xml file? If so, what happens once google indexes a page and you search for it?

ReLrO commented 7 years ago

I have and the page is completely blank. Google also has a tool called Fetch as Google (https://support.google.com/webmasters/answer/6066468?hl=en) which allows you to see how the crawler sees your site and Google sees it as a blank page. I read these articles that prove that Google sees their React generated client-side pages and then I read the comments in this discussion and it seems like something that is enabled in the starter kit is causing this behavior.

Not sure I completely agree with the article you just sent. Serving static content off a CDN also has a lot of pros.

In any case, I guess that I will have to move to server-side rendering if I want Google to index my site. hopefully it will be an easy transition and I wont need to refactor a lot of code. Any pointers you can give me on how to do it quickly? I am using the starter-kit with react-router and react-redux-router.

ghost commented 7 years ago

Fetch is not how Google sees your site, and apparently has the same issues it had three years ago. Please try what I suggested and let us know when you have an answer. I saw the same thing with Backbone (JS-content injected sites) back in 2013. And they were indeed crawled despite Google's lackluster and FUD-inducing tooling.

ghost commented 7 years ago

Also, there's no going back to Isomorphic. Once you start down the path the Cheshire cat will erase the tracks home.

ReLrO commented 7 years ago

Ok, thanks @jhabdas. I'll let you guys know after I index the page.

ghost commented 7 years ago

If you're looking to go isomorphic (yes, not Universal) look no further than the React Production Starter. And thank David for his incredible work and givingness. Thanks David <3

sauravskumar commented 7 years ago

@jhabdas can you give the link of the exact starter kit you'r talking about....... (are you talking about this starter kit????? )

ghost commented 7 years ago

@sauravskumar indeed

mstijak commented 7 years ago

If anybody is still interested, I wrote a blog post on how I managed to overcome this issue.

ghost commented 7 years ago

@mstijak Thanks for the write-up. You're hitting on a very hot topic and I'm curious to know if your post takes off. A couple of pull quotes may serve you well in getting the view/read ratio climb a little.

FWIW, I popped open CX Docs in the lynx browser and compared it with what you'd see on an isomorphic app and here's the difference:

screen shot 2016-11-14 at 2 25 24 pm screen shot 2016-11-14 at 2 25 40 pm

While it does not apply to apps (because apps are apps), it's important our news and blogs do not use JS magic as we'd be doing some pretty heavy damage to the Great Library that is Web. Also, be careful if you rely on a polyfill library to make your site work, as you're creating a single-point of failure for the future. Regardless, thanks again for sharing and I hope your post does well on Medium.

And for anyone else who's interested, you can find some isomorphic boilerplates for React on Awesome React Boilerplates. Have fun out there!

elyobo commented 7 years ago

For us, getting appropriate metadata in for sharing links on things like FB and Twitter was important as well and it didn't seem like they did a full load and execution of the JS to find it. We did limited server side loading of data so that the key data was loaded, while the non-essential stuff still gets client side loaded.

madshargreave commented 7 years ago

I sort of solved this by inling the javascript in my index.html

ghost commented 7 years ago

Related conversation on Medium in case anyone wants to share their experiences.