Open oprypin opened 6 years ago
May be worth doing an experiment first: edit only one known badly linked page and see how it evolves in the search.
Yeah, that's really an issue. The proposed solution should work, although it does not exactly fit the intended purpose of canonical links according to RFC 6596. But it can be used for this and I don't think there is a reasonable alternative. Google webmaster docs even mention that not only duplicated but also similar pages can be consolidated using a canonical link.
What about showing a little message on the top of the site when you are not on the newest version or master? So maybe something like this:
That would certainly be helpful but retroactively introducing manual edits to pages is not an easy task. The change I'm suggesting (add the same item to ALL pages) is the simplest possible operation of that type, and it can be added to newly generated pages immediately.
And such a message would not help to remove outdated API version from search engine results (at least not much).
It actually would, because the latest version would automatically have the most links pointed to it.
Regardless, we could do both, it's just harder.
Besides adding a canonical base url, the option to add a custom js would allow some tweaks for either adding a banner as other languages, analytics or edit. Depending on the project & host of the docs. WDYT?
@bcardiff Why not extend the idea to custom templates?
I don't think such a complete customization is either necessary nor particularly useful. Having the ability to inject some code into each page (for analytics etc.) should be sufficient.
If you need full customization, it's relatively easy to just create a custom HTML generator which uses the exported JSON data.
This is outside the scope. We don't even necessarily need any code changes to introduce the modification. Please just start with the experiment and manual changes :|
@Sija custom templates will either a) require to build the doc generator since the templates are .ecr and compiled inside the compiler. or b) switch to a template that are interpreted. Injecting a hand made .js file is enough to cover multiple other scenarios like the one I listed: edit page, GA, jump to newer version.
@oprypin and others. I've just manually edit https://crystal-lang.org/api/0.24.1/Array.html that is the top result for google:"crystal array" and append the canonical <link rel="canonical" href="https://crystal-lang.org/api/latest/Array.html" />
. Let's see how the crawlers deal with that.
This is not resolved and the PR should not have been merged.
@oprypin Why not? According to Google docs solution provided here is correct.
@bcardiff @oprypin The experiment seems to have been successful: Google results for crystal array
now ranks https://crystal-lang.org/api/0.24.2/Array.html as first result, which is the current redirect target of https://crystal-lang.org/api/latest/Array.html
The downside of this approach is, it seems that outdated API docs can't be discovered through Google search at all. This could however be useful in certain circumstances to figure out how a previous API version worked. I don't know if there is a valid solution for this, either. And it's most probably better to have the latest versions be more prominent. We just need to be aware that the backlog gets hidden from search.
I don't mind old versions gone from google search, we just should have a (link to a) version selector in the docs themselves.
@jhass It's not bad per se, but imagine some code using a method or type from stdlib that doesn't exist anymore. If you want to know about that method and don't find it in the current API docs, you'd probably try a web search. And it would be nice if it would eventually show up somewhere.
Although the PR was merged prematurely, it's still changeable since it's only used in master for now.
The use case pointed by @straight-shoota is important, but I am not sure what could be a better approach right now. For sure old docs could be changed, indexed or even regenerated eventually if needed.
Having a version selector could also be done with an injected JS ;-).
Maybe a future pass through docs generator could improve multiple version handling, or maybe some other integration.
Maybe we could remove the canonical
link from outdated versions of the API docs... probably not directly when a new version is released but after a cooldown period (for example until another release). This way the current version would always point to the latest
URL as canonical location. Older versions should loose importance over time so they can be allowed to show up on search results because the current one should hopefully rank higher..
I think this canonical
change has not been applied to all old versions, but it would be good to do so.
Also, please reopen the issue until that is done.
(@sdogruyol)
I am just starting Crystal and hitting this issue a lot. For example https://www.google.co.jp/searchq=crystal+ordered+hash resulted in top search being https://crystal-lang.org/api/0.24.2/Hash.html instead of 0.26.1
.
This seems to be a consistent issue for programming languages. My search for ruby hash
just got documentation for v2.0.0. Rails searches often get outdated pages from apidock.com.
I wonder if it would help to avoid the redirect of https://crystal-lang.org/api/latest/Hash.html - that would increase the likelihood of people sharing links for latest
and hopefully boost its SEO. I imagine it would also keep any deprecated pages searchable - https://crystal-lang.org/api/#{VERSION}/XXXXX.html
@guycall From a SEO perspective it might be better to keep links pointing to latest
. But there is a semantic issue here: Usually, you want to link to a specific API version. In a new release, everything might have changed but that would also break the reference. In some cases, you might want links to always point to the latest version, but that's probably not as common.
I think that simply applying the canonical
change to old versions would have a great effect but it still was not done for some reason. Only people with direct access to the host can do it though.
Does someone know what happen in the SEO realm when the canonical responds with a 404? That will happen when types got deprecated for example.
I am usually hesitant to touch already generated files. But it’s on my bucket add the canonical to all pages and also add some plain html banner to inform the user that there is a new version of the api.
@oprypin this implies back-porting the canonical
change for each version starting from 0.20.0
(the older version the API is available), and regenerating all the docs.
@j8r, no, it really doesn't. Just write a script to add it with regular expressions or something. That's what I meant all along.
Does someone know what happen in the SEO realm when the canonical responds with a 404?
There's a question on StackExchange, though no really substantial answer: https://webmasters.stackexchange.com/questions/109449/what-is-the-seo-impact-of-canonical-links-pointing-to-404-pages
But the worst that can happen is that the page won't show up in search results. That's not really an issue since it's outdated anyway.
and also add some plain html banner to inform the user that there is a new version of the api.
This would be a great enhancement!
I would guess the version bar on apidock.com helped them a lot with their SEO. Even if a user landed from a Google search onto the wrong version, they could easily navigate to the correct version. Hence Google sees longer sessions on apidock.com and not the user back button to Google.
This obviously doesn't help get the latest version to the highest ranking in Google, but it definitely helps the user.
in crystal-lang/crystal-website#79 @ukd1 suggests weighting pages using a sitemap. I'm not sure how this would play out, but we could try it. It shouldn't be too difficult to set up.
So the canonical
change, rather than being applied retroactively, was reverted in https://github.com/crystal-lang/crystal/pull/8348.
This also reverts #5990 which tried an alternative approach to solving the search priority issue using canonical URLs. But this completely removes older versions from search results.
Umm, that's good?
As it stands now, on Google you indeed do not run into any API docs pages between 0.25 and 0.31. I also suspect these pages boost /latest/ strongly enough that currently we're fortunate to almost always find latest docs in searches (0.33 at the moment).
So the confirmed working solution (also used by Python, which is a big deal) is abandoned, and the sitemap idea was started but also seems not used yet.
Also, according to my understanding, sitemaps would not help at all. https://support.google.com/webmasters/answer/183668
Google does not currently consume the
<priority>
attribute in sitemaps.
It would not help at best. At worst (though unlikely) it could make things worse.
List only canonical URLs in your sitemaps. If you have two versions of a page, list only the (Google-selected) canonical in the sitemap. If you have two versions of your site (for example, www and non-www), decide which is your preferred site, and put the sitemap there, and add rel=canonical or redirects on the other site.
anything to fix this would be extremely helpful, just a note my other pain point for me is figuring out when something was deprecated/changed/renamed, but that is beyond the scope of this
this completely removes older versions from search results.
Umm, that's good?
I don't think so. It means deprecated and removed features wouldn't show up in search results at all.
If sitemap priority really doesn't do anything and there's no other solution, we might have to return to canonical. That's probably the lesser evil. But if there's any chance, I'd like to find a way to keep old versions in the index.
Maybe you could somehow auto generate an index of removed symbols that links to the old docs, then add back the rel=canonical? That way, searches for current APIs will give the current results, but searches for deprecated / removed ones will give the index.
On Mon, Apr 6, 2020, 6:42 PM Johannes Müller notifications@github.com wrote:
this completely removes older versions from search results.
Umm, that's good?
I don't think so. It means deprecated and removed features wouldn't show up in search results at all.
If sitemap priority really doesn't do anything and there's no other solution, we might have to return to canonical. That's probably the lesser evil. But if there's any chance, I'd like to find a way to keep old versions in the index.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/5952#issuecomment-610092405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM4YSMRXROKC2IB2RVNCS3RLJSEXANCNFSM4E2WSE2Q .
ping? :runner:
If someone gives me access to where the docs are hosted I can add the canonical stuff.
I'll make PR to re-add --canonical-base-url
. It seems to have been a mistake to remove that.
Canonical base URLs are now in place for master API docs and should be in the next release as well. Next we need to add them to existing API docs for 0.35.1 and below. Maybe we can combine it with #9916 to also insert a visual indicator about outdated documentation.
FYI, canonical base urls have been added to all previous docs.
Although canonical base urls have been updated searchs for "crystal api hash" still shows 0.35.1 or 0.24.2 direct links.
Although User-declared canonical is https://crystal-lang.org/api/latest/Hash.html the Google-selected canonical is the inspected url.
I don't know if the fact that the canonical url is a 302 temporal-redirect prevent its usage as google-selected canonical.
It probably needs more time, let's not despair just yet.
https://www.google.com/search?q=crystal+api+hash The fact that the first link is literally "/latest/" is very good news for us.
In some cases, not yet.
Some pages work as expected: the google-selected canonical matches our own declaration
Other pages do not and I am not sure why.
IMO would be really cool to have at the top of each page a button or link saying "Note: these docs are not for the latest version crystal, click here for the latest version".
The problem with that is for when things got deprecated / refactored so there is no latest version, and in those cases, would be awesome to have some sort of awareness of that in the UI e.g. like on https://apidock.com/ruby/Enumerator/each_with_index
It makes archaeological investigations into "when was this deprecated and what is it now" a lot easier
@sam0x17 This is off-topic in the current thread. Let's not start a concurrent discussion.
Well, this had actually been discussed in this thread too, but ultimately there's a dedicated issue https://github.com/crystal-lang/crystal/issues/9916
For example, https://www.google.com/search?q=crystal+lang+namedtuple finds https://crystal-lang.org/api/0.21.1/NamedTuple.html (current version is 0.24.2).
This is bad because
1) people see old docs 2) the links from different versions fight each other for dominance instead of joining forces.
Python had this problem for a while, they seem to have solved it by adding a
canonical
ref to all their pages. Perhaps technically it's not the intended use of this tag, but it has definitely worked. For example, https://www.google.com/search?q=python+socket+doc finds the "latest" page because https://docs.python.org/3.5/library/socket.html contains<link rel="canonical" href="https://docs.python.org/3/library/socket.html" />
Proposed solution: edit every existing Crystal doc page in storage and add this tag. For example, https://crystal-lang.org/api/0.21.1/BigInt.html
I'm not entirely sure if this will work the same, because in Python the
/latest/
(/3/
) page actually exists as an alias, and is not a redirect. So maybe that would need to be changed as well.