ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.16k stars 5.75k forks source link

No summary for our documentation pages in Google search results #13267

Open cameel opened 2 years ago

cameel commented 2 years ago

@chriseth reports that search hits in our docs look like this in Google: image

Here's what Googles's help says about this: No page information in search results.

I'm pretty sure this has something to do with the robots.txt changes we did some time ago #10898. The search result seems to be from develop, which our robots.txt block. We only allow latest, v0.7.6 and latest release. The question is - why is develop still getting indexed (and appears in results before those other allowed versions) if we blocked it?

fewwwww commented 2 years ago
截屏2022-07-15 20 06 58

It works now. maybe it's a lagging in indexing?

cameel commented 2 years ago

I still see this happening. For example searching for solidity revert I get this:

solidity-revert-google-search-2022-07-18

The upper result is from 0.8.13 while the lower one is for 0.8.15. I think we're seeing different results because Google has all versions still indexed and you can get hits from ones that are disallowed in robots.txt and ones that are allowed, depending on what you search for.

So the question would be how to stop Google from returning results from blocked versions and return newer ones instead. I wonder if it's actually possible - we assumed that blocking them in robots.txt would remove them from search completely but maybe that's not how it works.

imblue-dabadee commented 2 years ago

disallow will stop Google from indexing/crawling the contents of the page, not the URL itself. This is why the URL for previous versions still appear but the meta description does not. Reference from Google

To stop a URL being indexed by Google, you must add <meta name="robots" content="noindex"> to relevant pages to assist with SEO. Funnily enough, you will have to redact the disallow field from the root robots.txt i.e. this file to allow Google's web crawler to read the relevant tag s.t. that it will not index it Reference from Google. I noticed that there are robots.txt for each directory which will be ignored as Google or search engine crawlers only reference the root directory.

Keep in mind that once the noindex tag is added, it will not be removed from search results until it is crawled. To expedite that, using the URL Inspection tool from Google will allow you to request an index. URL Inspection tool.

Although the pages will be removed after Google's crawlers reach your page again, it does not necessarily mean the desired pages will be in their desired rankings after a search so keep that in mind for SEO.

Hope that helps!

cameel commented 2 years ago

Thanks! That explains a lot.

Not sure if we'll be able to keep those old version out of Google Search then since it might not be feasible to rebuild them. Especially if it would require changing code in the repo in already tagged releases.

In any case, pinging @r0qs since this is one of the topics, we'll want him to take over eventually.