cyberbotics / webots-doc

Documentation for the Webots software
5 stars 5 forks source link

Bad Google referencing for the doc #945

Closed fabienrohrer closed 5 years ago

fabienrohrer commented 6 years ago

When typing "Webots supervisor", "Webots distancesensor", etc. in Google, links with the option ?version=8.5 appear first. This is really bad!

I think it's due to the github home page of the webots-doc. I fixed 2 weeks ago the main one to show the real links more in evidence:

https://github.com/omichel/webots-doc

=> I think this should be sufficient, but I'm not sure.

The webots-doc forks are still referencing the bad links. We do not have control on them:

=> I don't see other references of these bad urls?

omichel commented 6 years ago

Yes, https://github.com/Olivier0/webots-doc is mine. I just deleted it.

omichel commented 6 years ago

I will ask @remidhum to delete his fork as well.

fabienrohrer commented 6 years ago

Thank your.

I think it would be good if David, Tom and I fork also this repo to increase the weight of the new links (at least the time to invert the Google referencing)

omichel commented 6 years ago

I believe we should add the robots meta-element with NOINDEX in our documentation pages that are not the current one. See https://www.sistrix.com/ask-sistrix/google-index-google-bot-crawler/how-can-i-remove-a-url-on-my-website-from-the-google-index/

This should fix the problem. I will take care of that.

fabienrohrer commented 6 years ago

This is a good idea.

omichel commented 6 years ago

OK, this is done (in header.php of the main web site). Let's wait and see if this is efficient.

fabienrohrer commented 6 years ago

Thank you. In this case, we could also remove our forks to test only on action a time.

fabienrohrer commented 6 years ago

The bad referencing is still present after 6 days.

fabienrohrer commented 6 years ago

Alternatively we could do a sitemap to help google to crawl our website: https://support.google.com/webmasters/answer/156184?hl=en&ref_topic=4581190

fabienrohrer commented 6 years ago

Google spider parsed our pages 10 days ago. It's worth to still wait some time.

fabienrohrer commented 5 years ago

Worst than ever, the doc is no more indexed by Google. Probably that the "NOINDEX" trick (applied on pages with arguments only) is also implicitly affecting the pages without arguments :-/

fabienrohrer commented 5 years ago

Next actions:

  1. Remove the "NOINDEX" meta trick.
  2. Create a sitemap.
omichel commented 5 years ago

I just removed the "noindex" meta tag.

fabienrohrer commented 5 years ago

I created a sitemap and pushed it here: https://www.cyberbotics.com/files/repository/www/sitemap.xml

fabienrohrer commented 5 years ago

We just set a sitemap in the Google console. Let's wait and see how it improves the situation.

omichel commented 5 years ago

Google cannot read our docs. I tested from the google console to see what google sees, and here is what it sees from https://cyberbotics.com/doc/reference/motor for example:

image

According to some web search, google doesn't render very well Javascript generated content (it seems to be very tricky) and it might be that it was able to see it a while ago (this is why our pages got indexed) but now, it doesn't see it any more for sure. Maybe they changed some details in their JS rendering algorithm. As a consequence, I believe we should not rely on Javascript to generate the content of the doc, but rather on PHP. There are good and extensible MD2HTML libraries in PHP that we should use and extend as we did it for JS. For example http://parsedown.org may be a good candidate.

fabienrohrer commented 5 years ago

I agree, but this task is far to be obvious, mainly because of our custom extensions. We will also loose the dynamic load of pages unless to keep 2 systems (PHP + JS). I think this solution should be implemented as our last resort.

fabienrohrer commented 5 years ago

Moreover the local version would be problematic with the PHP solution, because we do not have a PHP server to serve the pages locally. Embedding a PHP server is tempting, but it would be probably heavy and overkilled.

omichel commented 5 years ago

Even this simplified doc page isn't readable for google: https://cyberbotics.com/test/box.html (I removed many javascripts from it). I am giving up on that for now and I think we should concentrate our effort on a static HTML content generated by Travis-CI with Node.js as proposed by @tn12787.

fabienrohrer commented 5 years ago

Thanks to the Olivier patch #949, the google ranking of the doc is now working better than ever.