Closed fabienrohrer closed 5 years ago
Yes, https://github.com/Olivier0/webots-doc is mine. I just deleted it.
I will ask @remidhum to delete his fork as well.
Thank your.
I think it would be good if David, Tom and I fork also this repo to increase the weight of the new links (at least the time to invert the Google referencing)
I believe we should add the robots meta-element with NOINDEX in our documentation pages that are not the current one. See https://www.sistrix.com/ask-sistrix/google-index-google-bot-crawler/how-can-i-remove-a-url-on-my-website-from-the-google-index/
This should fix the problem. I will take care of that.
This is a good idea.
OK, this is done (in header.php
of the main web site). Let's wait and see if this is efficient.
Thank you. In this case, we could also remove our forks to test only on action a time.
The bad referencing is still present after 6 days.
Alternatively we could do a sitemap to help google to crawl our website: https://support.google.com/webmasters/answer/156184?hl=en&ref_topic=4581190
Google spider parsed our pages 10 days ago. It's worth to still wait some time.
Worst than ever, the doc is no more indexed by Google. Probably that the "NOINDEX" trick (applied on pages with arguments only) is also implicitly affecting the pages without arguments :-/
Next actions:
I just removed the "noindex" meta tag.
I created a sitemap and pushed it here: https://www.cyberbotics.com/files/repository/www/sitemap.xml
We just set a sitemap in the Google console. Let's wait and see how it improves the situation.
Google cannot read our docs. I tested from the google console to see what google sees, and here is what it sees from https://cyberbotics.com/doc/reference/motor for example:
According to some web search, google doesn't render very well Javascript generated content (it seems to be very tricky) and it might be that it was able to see it a while ago (this is why our pages got indexed) but now, it doesn't see it any more for sure. Maybe they changed some details in their JS rendering algorithm. As a consequence, I believe we should not rely on Javascript to generate the content of the doc, but rather on PHP. There are good and extensible MD2HTML libraries in PHP that we should use and extend as we did it for JS. For example http://parsedown.org may be a good candidate.
I agree, but this task is far to be obvious, mainly because of our custom extensions. We will also loose the dynamic load of pages unless to keep 2 systems (PHP + JS). I think this solution should be implemented as our last resort.
Moreover the local version would be problematic with the PHP solution, because we do not have a PHP server to serve the pages locally. Embedding a PHP server is tempting, but it would be probably heavy and overkilled.
Even this simplified doc page isn't readable for google: https://cyberbotics.com/test/box.html (I removed many javascripts from it). I am giving up on that for now and I think we should concentrate our effort on a static HTML content generated by Travis-CI with Node.js as proposed by @tn12787.
Thanks to the Olivier patch #949, the google ranking of the doc is now working better than ever.
When typing "Webots supervisor", "Webots distancesensor", etc. in Google, links with the option
?version=8.5
appear first. This is really bad!I think it's due to the github home page of the webots-doc. I fixed 2 weeks ago the main one to show the real links more in evidence:
https://github.com/omichel/webots-doc
=> I think this should be sufficient, but I'm not sure.
The webots-doc forks are still referencing the bad links. We do not have control on them:
=> I don't see other references of these bad urls?