OpenMage / magento-lts

Official OpenMage LTS codebase | Migrate easily from Magento Community Edition in minutes! Download the source code for free or contribute to OpenMage LTS | Security vulnerability patches, bug fixes, performance improvements and more.
https://www.openmage.org
Open Software License 3.0
870 stars 436 forks source link

Magento responds to incorrectly-cased requests with 200 code #328

Closed colinmollenhour closed 2 years ago

colinmollenhour commented 7 years ago

Because Magento uses case-insensitive collations on it's indexes the routers match url keys that have different cases than the requests. This results in a url "/foo" being accessible with a 200 response at "/foO", "/fOo", "/FOO", etc. According to the HTTP spec this is not necessarily invalid but to the SEO world this is considered a duplicate content problem. Technically I think it would be appropriate for these alternate-cased requests to receive 404 responses which could be accomplished by replacing the indexes on the various url key columns with indexes having case-sensitive collations. Some might advocate though for 301 responses to the correctly-cased urls.

What are your thoughts regarding 200 vs 301 vs 404?

tomekjordan commented 7 years ago

Leave it as is. only 200. 404 is the worst idea. Affecting page rankings. 301 is much better than 404 - but this will be breaking any SEO rewrites for other plugins - I don't know I'm just merchant with some SEO experience it should give 200 always. foo or FOO maybe difference for Apache - but not for Google. so its no matter for SEO. difference for SEO is when you use foo and foo/ or foo with params (but params are often cut with Google search console OK, Magento url rewrites are f***ed much, you can freely add lowercase - save it as default to avoid any duplication in db. Wordpress has great Redirection plugin https://wordpress.org/plugins/redirection/ that does the job. Magento sucks in this area.... but its much better then other ecommerce software.

colinmollenhour commented 7 years ago

There is some more discussion/opinions here: https://www.quora.com/Does-Google-treat-inconsistent-capitalisation-URLs-as-duplicate-content It is worth noting that on Mac/*nix it is completely valid to have two files with the same letters and different cases which serve different content so I don't see how it can be safe to assume search engines never account for this. Perhaps they make informed assumptions after crawling both but I don't think we know that for sure. I am still of the opinion that 200 is incorrect both from a RFC standpoint and a SEO standpoint but that either way it is a very minor issue, especially if you use canonical tags because generally these alternating cases shouldn't exist unless users are making mistakes.

For new sites using 404 would solve the issue because these broken links would ostensibly be discovered and fixed before ever being indexed. However, for existing sites I understand the concern of making a 200 become a 404.

I don't see the issue with using a 301, could you expound on why you think that is a bad idea?

Perhaps this could be a configurable where the user could choose what response to give when the request case doesn't match the url key? Options could be 200 (default due to historical reasons), 404 and 301.

sreichel commented 6 years ago

I don't see the issue with using a 301, could you expound on why you think that is a bad idea?

From https://moz.com/blog/301-redirection-rules-for-seo

301 redirects result in around a 15% loss of PageRank. Matt Cutts confirmed this in 2013 when he explained that a 301 loses the exact same amount of PageRank as a link from one page to another.

Magento should store URLs in lowercase by default, but for exiting URLs it should be configurable at least. (What about using rewrites if someone wants 301/404? https://www.askapache.com/htaccess/rewrite-uppercase-lowercase/)

tomekjordan commented 6 years ago

404 is bad idea, because it will trigger thousands 404 errors in google search console. I would go with 301, or better add just canoncal to lowercase or something.

please dont forget stores are using 3rd party SEO extensions like SEO Ultimate from Mageworx, Mirasvit and any changes could break things related to them