Open kinueng opened 2 years ago
Found similar URLs that users should not be using but they are showing up in search results
Search example https://www.google.com/search?q=Package+jakarta.batch.runtime.context+-+Open+Liberty
Our thoughts are that the code that handles a URL like https://openliberty.io/docs/modules/reference/liberty-jakartaee9.1-javadoc/jakarta/batch/api/AbstractBatchlet.html and redirects it to https://openliberty.io/docs/latest/reference/javadoc/liberty-jakartaee9.1-javadoc.html?package=jakarta/batch/api/package-frame.html&class=jakarta/batch/api/AbstractBatchlet.html is unable to handle the two broken URL examples in the issue description.
Start with looking at the code that handles transforming the URLs and redirecting.
Hi @kinueng noticed that for this issue for all the jakarta/javaee/microprofile which ends frame.html is not redirecting eg: https://openliberty.io/docs/modules/reference/liberty-javaee8-javadoc/overview-frame.html https://openliberty.io/docs/modules/reference/liberty-javaee8-javadoc/allclasses-frame.html https://openliberty.io/docs/modules/reference/liberty-javaee8-javadoc/allclasses-noframe.html https://openliberty.io/docs/modules/reference/liberty-javaee8-javadoc/overview-frame.html https://openliberty.io/docs/modules/reference/liberty-javaee8-javadoc/javax/annotation/package-frame.html
https://openliberty.io/docs/modules/reference/microprofile-5.0-javadoc/overview-frame.html https://openliberty.io/docs/modules/reference/microprofile-5.0-javadoc/allclasses-frame.html
https://openliberty.io/docs/modules/reference/liberty-jakartaee9.1-javadoc/overview-frame.html https://openliberty.io/docs/modules/reference/liberty-jakartaee9.1-javadoc/jakarta/activation/package-frame.html
The last remaining piece is to decide how to mark the javadoc files used for iframes as noindex
to avoid search engines indexing the iframes. Example of the iframe files are in comment https://github.com/OpenLiberty/openliberty.io/issues/2665#issuecomment-1170319592. We cannot put the HTML element noindex
into the files because the files are generated by a script command javadoc
.
We may be able to use rules in our robots.txt to prevent crawlers from seeing the iframes by setting rules specific to these files still causing issues (overview-frame.html, package-frame.html, allclasses-frame.html). We can also target the bot that indexes for Google Search specifically if using the * user agent prevents the iframes from loading on the site.
More info here: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt#create_rules
Problem
One of the iframe content from the javadocs are being indexed by search engines.
Recreation steps
I subset of URLs that should not be indexed by search engines.
Possible Solutions 1
Possible Solution 2
Redirect these URLs to the appropriate page. The decision needs to be made which page is considered the "appropriate". Here are some choices
jakarta.enterprise.inject
jakarta.enterprise.inject
jakarta.enterprise.inject
package.