OpenSourceOrg / dotOrg

Public tracker of opensource.org issues
GNU General Public License v3.0
7 stars 1 forks source link

NetHack license URL gives it an SEO boost for "PHP license" searches #64

Closed ramsey closed 3 months ago

ramsey commented 3 months ago

Description

I was performing a search for some PHP license-related history, and this URL appears very high in the search results:

https://opensource.org/license/nethack-php

I was curious, so I visited the page, but the license does not appear at all related to the PHP project, despite having -php in the URL. The presence of -php in the URL caused it to appear high in unrelated search results.

Why is -php in the NetHack license URL? I would expect the following URL instead:

https://opensource.org/license/nethack

In fact, this URL does work, but it redirects to the URL that ends in -php.

I suspect this is leftover from the site migration, and perhaps this page was originally nethack.php and was converted to nethack-php.

Steps to reproduce

  1. Enter https://opensource.org/license/nethack into your browser.
  2. Notice how it redirects to https://opensource.org/license/nethack-php; this redirect uses 301 to indicate it is a permanent redirect.

What you expected to happen

I expect the canonical URL for this page to be:

https://opensource.org/license/nethack

Entering https://opensource.org/license/nethack-php should result in a 301 or 308 redirect to https://opensource.org/license/nethack. A 302 or 307 redirect is not sufficient, since search engines and other caches will continue to retain the incorrect URL.

What actually happened

Requesting https://opensource.org/license/nethack results in a 301 redirect to https://opensource.org/license/nethack-php, and the presence of -php in the URL causes search engines to regard this URL as a potential result when search terms include "php" and "license."

smaffulli commented 3 months ago

very interesting, thanks for sharing. The -php is an artifact of the way the licenses pages have been maintained in the past. We've found license pages with extensions .htm .html .php, some had no prettified permalink (from Drupal, they were listed as /node/[random number] and other variations. As we migrated the site to WordPress, we tried to maintain the URL but WP moves/redirects the . dots to - dashes. That's what you're witnessing. I changed the URL, tested that the old URL is indeed giving a 301.

ramsey commented 3 months ago

Thanks!