By: Byrne Reese
Donated in whole to the Movable Type Open Source Project Copyright 2007-2008 Six Apart Ltd.
CleanSweep is a plugin that assists administrators in finding and fixing broken inbound links to their website. It was built to support three use cases:
to help users monitor any broken links on their blog and have a system that can automatically adapt by redirecting stale and inbound links to the proper destination.
to help users get a clean start with their blog by allowing them to completely restructure their permalink URL structure and have a system that can automatically adapt by redirecting stale and inbound links to the proper destination.
to help users in the process of migrating to Movable Type who are forced to modify their web site's URL and permalink structure.
These use cases have to do with preserving a site's page rank in light of a major redesign.
After configuration, Clean Sweep will track all inbound links that result in a 404 and will ultimately deduce the intended file and redirect the client to that file.
Clean Sweep will also produce a set of Apache mod_rewrite rules to map inbound links to their destination permanently.
To install this plugin follow the instructions found here:
http://tinyurl.com/easy-plugin-install
Clean Sweep supports both Apache and Lighttpd. For now you elect what web server you are using on a blog-by-blog basis. All documentation however, refers to Apache, as it is far more common. Lighttpd users should simply follow the analogous instruction for their web server when appropriate.
Create a page in Movable Type called "URL Not Found". Give it a basename of "404". Place whatever personalized message you want that will be displayed to your visitors when Clean Sweep is unsuccessful in mapping the request to the correct page or destination. Publish the page and remember the complete URL to this page on your published blog. (Alternatively, create an Index template for your 404.)
Navigate to the Plugin Settings area for Clean Sweep. Enter the full URL to your "URL Not Found" page (as created above) into the "404 URL" configuration parameter.
Also in the Plugin Settings area, make note of the Apache configuration
directive and place it in your httpd.conf
or in an .htaccess
file. Restart
the web server, if necessary.
Also in Plugin Settings:
Use the Valid File Types field to specify the types of files Clean Sweep should try to redirect. The extension used for the types of Entries and Pages published by your site are what Clean Sweep is able to best redirect, and this field provides a way to focus on those.
Clean Sweep can guess at a helpful URL using the Traverse URL option. When
trying to find a URL for http://example.com/a/b/c/d.html
, for example,
traverse up the URL to redirect to http://example.com/a/b/c/
or
http://example.com/a/b/
, if those URLs exist. This may or may not be useful
depending upon site architecture and is done as a last-ditch effort to find a
reasonable page to redirect to.
Use the Append Requested URL checkbox to include a target
argument with the
original URL.
Clean Sweep will use the following ruleset in trying to guess the target URL the client is requesting:
Is the target resource using the entry id as a URL? This is a prevalent URL pattern for older MT installations. This will:
Map: http://www.majordojo.com/archives/000675.php To: http://www.majordojo.com/205/07/goodbye-bookque.php
Is the target resource using underscores when it should be using hyphens? Many users have switched to using hyphens for purported SEO benefits. This will attempt to look for a file in the system of the same name, but using '-' instead of '_'. This will:
Map: http://www.majordojo.com/2005/07/goodbye_bookque.php To: http://www.majordojo.com/2005/07/goodbye-bookque.php
Is their a target resource with the same basename somewhere? If a user switches their primary mapping to use a date based URL as opposed to a category based URL, then this rule will apply. This will:
Map: http://www.majordojo.com/personal-projects/goodbye-bookque.php To: http://www.majordojo.com/2005/07/goodbye-bookque.php
If Clean Sweep was unable to redirect the request it will return the 404 "URL Not Found" page created above, and logs the 404. You can review all of the logged 404s by visiting Manage > Logged 404s.
On the Logged 404s screen are options to mange the 404s, including the ability to specify how a given URL should be handled. Click "Map" to adjust this in a popup dialog, where you can select:
Additionally on the Map dialog is a table of referrers. Note that a referring URL is not available for every URL.
Once a URL has been mapped it is no longer counted. This is useful because after mapping you can use the Reset function (on the Manage > Logged 404s page) to delete the count of occurrences for that URL. After manually fixing a reported URL you can Reset it, and the effect is that new broken URLs will be quickly visible even after having only two or three hits.
Clean Sweep is licensed under the GPL (v2).