Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.76k stars 889 forks source link

URL parameter cleansing #11620

Closed jonoalderson closed 5 years ago

jonoalderson commented 5 years ago

Query parameters and malformed URLs cause SEO headaches, even when canonical tags and crawling/indexing directives are used. From cache-misses to wasted crawl resources, to fragmentation and indexing challenges.

We could/should clean them up when possible, by redirecting incoming 'messy' requests.

Example URLs

For many websites, we can consider the following example URLs to be unique:

Proposed cleansing

To avoid making a mess (technically, or politically), we should set these to off by default, but include a setting checkbox for 'clean up URLs' (and filters for each component, to allow for modifying execution).

Scenarios

Cache-misses

Unique URLs frequently miss system/external caches, and slow down websites. By reducing the number of different URLs, we increase cache hit rate.

Crawl inefficiency

Google still discovers, crawls and processes URL variants. They show up in Search Console, Google expands resources crawling and assessing them, etc. Reducing the number of different URLs removes the need for Google to process canonical tags in these situations (and as such, might result in a small performance uplift).

Reporting fragmentation

Though this is a secondary benefit, it's nice to consider that this will significantly reduce the number of unique/fragmented URLs in systems like Google/Adobe Analytics.

Third party de-fudging

It's hard to prevent third party sites putting noise in URL parameters; in particular, social and sharing platforms (e.g., Buffer) append their own flavour of UTM and tracking nonsense to URLs. This goes some way to reducing the impact of this.

Implementation consideration

At the moment, we have a partial proof of concept operating on yoast.com, which transforms URLs containing /?utm_ to /#utm_. This is implemented via a notably weak regex rule, which isn't suitable for broader or general use. A more robust implementation should properly assess the URL and its components, deconstruct it, then reconstruct it from scratch.

jonoalderson commented 5 years ago

NOTE: Obvious performance and legacy implications here. Ideally, this should live above / outside of WP, which comes with its own issues.

moorscode commented 5 years ago

To gain the most performance we could only offer this feature when non-PHP redirects are configured. Having different functionality depending on such a configuration makes explaining a bit harder. Though we want to motivate people to choose the most performant solution.

Keep in mind that there are some issues regarding the use of nginx or apache redirects

jdevalk commented 5 years ago

We used to have functionality like this, called permalink redirect... We killed it because it was causing far too many issues, with plugins adding their own parameters etc. BUT: automatically redirecting ?utm to #utm etc makes a lot of sense to me...

jonoalderson commented 5 years ago

Yeah, that's the big one. The rest are added value.

lelas commented 5 years ago

This can cause far more damage than it could do good. Clearly overstepping!

Please consider the implications it can have on campaign tracking. If Yoast SEO alters gclid parameters (automated Google Ads -> Google Analytics tracking) or ValueTrack parameters, both case sensitive and both using small + capital letters. You would ruin insane amounts of marketing campaign datasets.

I, for instance, am currently investigating a 301 redirect from URI with uppercase letters in URL parameters to lowercase parameters (and suspected Yoast SEO to be the culprit). A new Google Ads client has spent oodles of money on Google Ads in the past months and has no way to see what converted and what didn't. All Google Ads data in Google Analytics suddenly went: (not set)

Just to mention a few. There are likely many other web analytics/advertising tools that are using uppercase in tracking URI's.

jdevalk commented 5 years ago

I'm very hesitant too @lelas for exactly the same reason. Going to close as we're not going to prioritize this anyway