crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.42k stars 1.62k forks source link

Outdated API doc links in Google search (bad SEO) #5952

Open oprypin opened 6 years ago

oprypin commented 6 years ago

For example, https://www.google.com/search?q=crystal+lang+namedtuple finds https://crystal-lang.org/api/0.21.1/NamedTuple.html (current version is 0.24.2).

This is bad because

1) people see old docs 2) the links from different versions fight each other for dominance instead of joining forces.

Python had this problem for a while, they seem to have solved it by adding a canonical ref to all their pages. Perhaps technically it's not the intended use of this tag, but it has definitely worked. For example, https://www.google.com/search?q=python+socket+doc finds the "latest" page because https://docs.python.org/3.5/library/socket.html contains <link rel="canonical" href="https://docs.python.org/3/library/socket.html" />

Proposed solution: edit every existing Crystal doc page in storage and add this tag. For example, https://crystal-lang.org/api/0.21.1/BigInt.html

--- a/BigInt.html
+++ b/BigInt.html
@@ -6,6 +6,7 @@
   <link href="css/style.css" rel="stylesheet" type="text/css" />
   <script type="text/javascript" src="js/doc.js"></script>
   <title>BigInt - github.com/crystal-lang/crystal</title>
+  <link rel="canonical" href="https://crystal-lang.org/api/latest/BigInt.html" />
 </head>
 <body>

I'm not entirely sure if this will work the same, because in Python the /latest/ (/3/) page actually exists as an alias, and is not a redirect. So maybe that would need to be changed as well.

oprypin commented 6 years ago

May be worth doing an experiment first: edit only one known badly linked page and see how it evolves in the search.

straight-shoota commented 6 years ago

Yeah, that's really an issue. The proposed solution should work, although it does not exactly fit the intended purpose of canonical links according to RFC 6596. But it can be used for this and I don't think there is a reasonable alternative. Google webmaster docs even mention that not only duplicated but also similar pages can be consolidated using a canonical link.

wooster0 commented 6 years ago

What about showing a little message on the top of the site when you are not on the newest version or master? So maybe something like this: outdated.png

oprypin commented 6 years ago

That would certainly be helpful but retroactively introducing manual edits to pages is not an easy task. The change I'm suggesting (add the same item to ALL pages) is the simplest possible operation of that type, and it can be added to newly generated pages immediately.

straight-shoota commented 6 years ago

And such a message would not help to remove outdated API version from search engine results (at least not much).

oprypin commented 6 years ago

It actually would, because the latest version would automatically have the most links pointed to it.

Regardless, we could do both, it's just harder.

bcardiff commented 6 years ago

Besides adding a canonical base url, the option to add a custom js would allow some tweaks for either adding a banner as other languages, analytics or edit. Depending on the project & host of the docs. WDYT?

Sija commented 6 years ago

@bcardiff Why not extend the idea to custom templates?

straight-shoota commented 6 years ago

I don't think such a complete customization is either necessary nor particularly useful. Having the ability to inject some code into each page (for analytics etc.) should be sufficient.

If you need full customization, it's relatively easy to just create a custom HTML generator which uses the exported JSON data.

oprypin commented 6 years ago

This is outside the scope. We don't even necessarily need any code changes to introduce the modification. Please just start with the experiment and manual changes :|

bcardiff commented 6 years ago

@Sija custom templates will either a) require to build the doc generator since the templates are .ecr and compiled inside the compiler. or b) switch to a template that are interpreted. Injecting a hand made .js file is enough to cover multiple other scenarios like the one I listed: edit page, GA, jump to newer version.

@oprypin and others. I've just manually edit https://crystal-lang.org/api/0.24.1/Array.html that is the top result for google:"crystal array" and append the canonical <link rel="canonical" href="https://crystal-lang.org/api/latest/Array.html" />. Let's see how the crawlers deal with that.

oprypin commented 6 years ago

This is not resolved and the PR should not have been merged.

Sija commented 6 years ago

@oprypin Why not? According to Google docs solution provided here is correct.

straight-shoota commented 6 years ago

@bcardiff @oprypin The experiment seems to have been successful: Google results for crystal array now ranks https://crystal-lang.org/api/0.24.2/Array.html as first result, which is the current redirect target of https://crystal-lang.org/api/latest/Array.html

straight-shoota commented 6 years ago

The downside of this approach is, it seems that outdated API docs can't be discovered through Google search at all. This could however be useful in certain circumstances to figure out how a previous API version worked. I don't know if there is a valid solution for this, either. And it's most probably better to have the latest versions be more prominent. We just need to be aware that the backlog gets hidden from search.

jhass commented 6 years ago

I don't mind old versions gone from google search, we just should have a (link to a) version selector in the docs themselves.

straight-shoota commented 6 years ago

@jhass It's not bad per se, but imagine some code using a method or type from stdlib that doesn't exist anymore. If you want to know about that method and don't find it in the current API docs, you'd probably try a web search. And it would be nice if it would eventually show up somewhere.

bcardiff commented 6 years ago

Although the PR was merged prematurely, it's still changeable since it's only used in master for now.

The use case pointed by @straight-shoota is important, but I am not sure what could be a better approach right now. For sure old docs could be changed, indexed or even regenerated eventually if needed.

Having a version selector could also be done with an injected JS ;-).

Maybe a future pass through docs generator could improve multiple version handling, or maybe some other integration.

straight-shoota commented 6 years ago

Maybe we could remove the canonical link from outdated versions of the API docs... probably not directly when a new version is released but after a cooldown period (for example until another release). This way the current version would always point to the latest URL as canonical location. Older versions should loose importance over time so they can be allowed to show up on search results because the current one should hopefully rank higher..

oprypin commented 6 years ago

I think this canonical change has not been applied to all old versions, but it would be good to do so. Also, please reopen the issue until that is done. (@sdogruyol)

guycall commented 6 years ago

I am just starting Crystal and hitting this issue a lot. For example https://www.google.co.jp/searchq=crystal+ordered+hash resulted in top search being https://crystal-lang.org/api/0.24.2/Hash.html instead of 0.26.1.

This seems to be a consistent issue for programming languages. My search for ruby hash just got documentation for v2.0.0. Rails searches often get outdated pages from apidock.com.

I wonder if it would help to avoid the redirect of https://crystal-lang.org/api/latest/Hash.html - that would increase the likelihood of people sharing links for latest and hopefully boost its SEO. I imagine it would also keep any deprecated pages searchable - https://crystal-lang.org/api/#{VERSION}/XXXXX.html

straight-shoota commented 6 years ago

@guycall From a SEO perspective it might be better to keep links pointing to latest. But there is a semantic issue here: Usually, you want to link to a specific API version. In a new release, everything might have changed but that would also break the reference. In some cases, you might want links to always point to the latest version, but that's probably not as common.

oprypin commented 6 years ago

I think that simply applying the canonical change to old versions would have a great effect but it still was not done for some reason. Only people with direct access to the host can do it though.

bcardiff commented 6 years ago

Does someone know what happen in the SEO realm when the canonical responds with a 404? That will happen when types got deprecated for example.

I am usually hesitant to touch already generated files. But it’s on my bucket add the canonical to all pages and also add some plain html banner to inform the user that there is a new version of the api.

j8r commented 6 years ago

@oprypin this implies back-porting the canonical change for each version starting from 0.20.0 (the older version the API is available), and regenerating all the docs.

oprypin commented 6 years ago

@j8r, no, it really doesn't. Just write a script to add it with regular expressions or something. That's what I meant all along.

straight-shoota commented 6 years ago

Does someone know what happen in the SEO realm when the canonical responds with a 404?

There's a question on StackExchange, though no really substantial answer: https://webmasters.stackexchange.com/questions/109449/what-is-the-seo-impact-of-canonical-links-pointing-to-404-pages

But the worst that can happen is that the page won't show up in search results. That's not really an issue since it's outdated anyway.

straight-shoota commented 6 years ago

and also add some plain html banner to inform the user that there is a new version of the api.

This would be a great enhancement!

guycall commented 6 years ago

I would guess the version bar on apidock.com helped them a lot with their SEO. Even if a user landed from a Google search onto the wrong version, they could easily navigate to the correct version. Hence Google sees longer sessions on apidock.com and not the user back button to Google.

screen shot 2018-09-20 at 07 02 08

This obviously doesn't help get the latest version to the highest ranking in Google, but it definitely helps the user.

straight-shoota commented 5 years ago

in crystal-lang/crystal-website#79 @ukd1 suggests weighting pages using a sitemap. I'm not sure how this would play out, but we could try it. It shouldn't be too difficult to set up.

oprypin commented 4 years ago

So the canonical change, rather than being applied retroactively, was reverted in https://github.com/crystal-lang/crystal/pull/8348.

This also reverts #5990 which tried an alternative approach to solving the search priority issue using canonical URLs. But this completely removes older versions from search results.

Umm, that's good?


As it stands now, on Google you indeed do not run into any API docs pages between 0.25 and 0.31. I also suspect these pages boost /latest/ strongly enough that currently we're fortunate to almost always find latest docs in searches (0.33 at the moment).

So the confirmed working solution (also used by Python, which is a big deal) is abandoned, and the sitemap idea was started but also seems not used yet.


Also, according to my understanding, sitemaps would not help at all. https://support.google.com/webmasters/answer/183668

Google does not currently consume the <priority> attribute in sitemaps.

It would not help at best. At worst (though unlikely) it could make things worse.

List only canonical URLs in your sitemaps. If you have two versions of a page, list only the (Google-selected) canonical in the sitemap. If you have two versions of your site (for example, www and non-www), decide which is your preferred site, and put the sitemap there, and add rel=canonical or redirects on the other site.

sam0x17 commented 4 years ago

anything to fix this would be extremely helpful, just a note my other pain point for me is figuring out when something was deprecated/changed/renamed, but that is beyond the scope of this

straight-shoota commented 4 years ago

this completely removes older versions from search results.

Umm, that's good?

I don't think so. It means deprecated and removed features wouldn't show up in search results at all.

If sitemap priority really doesn't do anything and there's no other solution, we might have to return to canonical. That's probably the lesser evil. But if there's any chance, I'd like to find a way to keep old versions in the index.

refi64 commented 4 years ago

Maybe you could somehow auto generate an index of removed symbols that links to the old docs, then add back the rel=canonical? That way, searches for current APIs will give the current results, but searches for deprecated / removed ones will give the index.

On Mon, Apr 6, 2020, 6:42 PM Johannes Müller notifications@github.com wrote:

this completely removes older versions from search results.

Umm, that's good?

I don't think so. It means deprecated and removed features wouldn't show up in search results at all.

If sitemap priority really doesn't do anything and there's no other solution, we might have to return to canonical. That's probably the lesser evil. But if there's any chance, I'd like to find a way to keep old versions in the index.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/5952#issuecomment-610092405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM4YSMRXROKC2IB2RVNCS3RLJSEXANCNFSM4E2WSE2Q .

renich commented 4 years ago

ping? :runner:

asterite commented 4 years ago

If someone gives me access to where the docs are hosted I can add the canonical stuff.

straight-shoota commented 3 years ago

I'll make PR to re-add --canonical-base-url. It seems to have been a mistake to remove that.

straight-shoota commented 3 years ago

Canonical base URLs are now in place for master API docs and should be in the next release as well. Next we need to add them to existing API docs for 0.35.1 and below. Maybe we can combine it with #9916 to also insert a visual indicator about outdated documentation.

bcardiff commented 3 years ago

FYI, canonical base urls have been added to all previous docs.

bcardiff commented 3 years ago

Although canonical base urls have been updated searchs for "crystal api hash" still shows 0.35.1 or 0.24.2 direct links.

Although User-declared canonical is https://crystal-lang.org/api/latest/Hash.html the Google-selected canonical is the inspected url.

I don't know if the fact that the canonical url is a 302 temporal-redirect prevent its usage as google-selected canonical.

oprypin commented 3 years ago

It probably needs more time, let's not despair just yet.

oprypin commented 3 years ago

https://www.google.com/search?q=crystal+api+hash The fact that the first link is literally "/latest/" is very good news for us.

renich commented 3 years ago

image

In some cases, not yet.

bcardiff commented 3 years ago

Some pages work as expected: the google-selected canonical matches our own declaration

Screen Shot 2021-03-09 at 18 27 58

Other pages do not and I am not sure why.

Screen Shot 2021-03-09 at 18 27 17
sam0x17 commented 3 years ago

IMO would be really cool to have at the top of each page a button or link saying "Note: these docs are not for the latest version crystal, click here for the latest version".

The problem with that is for when things got deprecated / refactored so there is no latest version, and in those cases, would be awesome to have some sort of awareness of that in the UI e.g. like on https://apidock.com/ruby/Enumerator/each_with_index

It makes archaeological investigations into "when was this deprecated and what is it now" a lot easier

oprypin commented 3 years ago

@sam0x17 This is off-topic in the current thread. Let's not start a concurrent discussion.

oprypin commented 3 years ago

Well, this had actually been discussed in this thread too, but ultimately there's a dedicated issue https://github.com/crystal-lang/crystal/issues/9916