filiph / linkcheck

Fast link checker
https://pub.dartlang.org/packages/linkcheck
MIT License
419 stars 51 forks source link

linkcheck

Build Status

Very fast link-checking.

linkcheck versus the popular blc tool

Philosophy:

A good utility is custom-made for a job. There are many link checkers out there, but none of them seems to be striving for the following set of goals.

Crawls fast

Finds all relevant problems

Leaves out irrelevant problems

Good UX

Brief and meaningful output

It goes without saying that linkcheck fully respects definitions in robots.txt and throttles itself when accessing websites.

Installation

Direct download

You should be able to immediately run this executable -- it has no external dependencies. For example, assuming you are on macOS and downloaded the file to the default downloads directory, you can go to your Terminal (or iTerm, or SSH) and run ./Downloads/linkcheck-mac-x64.

You can rename the file and move it to any directory. For example, on a Linux box, you might want to rename the executable to simply linkcheck, and move it to /usr/local/bin, $HOME/bin or another directory in your $PATH.

Docker image

Latest executable in a docker image:

docker run --rm tennox/linkcheck --help

(built from a repo mirror by @tennox)

From Source

Step 1. Install Dart

Follow the installation instructions for your platform from the Get the Dart SDK documentation.

For example, on a Mac, assuming you have homebrew, you just run:

$ brew tap dart-lang/dart
$ brew install dart

Step 2. Install linkcheck

Once Dart is installed, run:

$ dart pub global activate linkcheck

Pub installs executables into ~/.pub-cache/bin, which may not be on your path. You can fix that by adding the following to your shell's config file (.bashrc, .bash_profile, etc.):

export PATH="$PATH":"~/.pub-cache/bin"

Then either restart the terminal or run source ~/.bash_profile (assuming ~/.bash_profile is where you put the PATH export above).

Docker

If you have Docker installed, you can build the image and use the container avoiding local Dart installation.

Build

In the project directory, for x86 and x64 architectures, run

docker build -t filiph/linkcheck .

On ARM architectures (Raspberry, M1 Mac), run

docker build --platform linux/arm64 -t filiph/linkcheck .

Usage (container mode)

docker run filiph/linkcheck <URL>

All below usage guidelines are valid running on container too.

Usage (github action)

uses: filiph/linkcheck@2.0.23
  with:
    arguments: <URL>

All below usage guidelines are valid running as a GitHub action too.

Usage

If in doubt, run linkcheck -h. Here are some examples to get you started.

Localhost

Running linkcheck without arguments will try to crawl http://localhost:8080/ (which is the most common local server URL).

If you run your local server on http://localhost:4000/, for example, you can do:

linkcheck will not throttle itself when accessing localhost. It will go as fast as possible.

Deployed sites

Many entry points

Assuming you have a text file mysites.txt like this:

http://egamebook.com/
http://filiph.net/
https://alojz.cz/

You can run linkcheck -i mysites.txt and it will crawl all of them and also check links between them. This is useful for:

  1. Link-checking projects spanning many domains (or subdomains).
  2. Checking all your public websites / blogs / etc.

There's another use for this, and that is when you have a list of inbound links, like this:

https://www.dart.dev/
https://www.dart.dev/tools/
https://www.dart.dev/guides/

You probably want to make sure you never break your inbound links. For example, if a page changes URL, the previous URL should still work (redirecting to the new page when appropriate).

Where do you get a list of inbound links? Try your site's sitemap.xml as a starting point, and — additionally — try something like the Google Webmaster Tools’ crawl error page.

Skipping URLs

Sometimes, it is legitimate to ignore some failing URLs. This is done via the --skip-file option.

Let's say you're working on a site and a significant portion of it is currently under construction. You can create a file called my_skip_file.txt, for example, and fill it with regular expressions like so:

# Lines starting with a hash are comments.

admin/
\.s?css$
\#info

The file above includes a comment on line 1 which will be ignored. Line 2 is blank and will be ignored as well. Line 3 contains a broad regular expression that will make linkcheck ignore any link to a URL containing admin/ anywhere in it. Line 4 shows that there is full support for regular expressions – it will ignore URLs ending with .css and .scss. Line 5 shows the only special escape sequence. If you need to start your regular expression with a # (which linkcheck would normally parse as a comment) you can precede the # with a backslash (\). This will force linkcheck not to ignore the line. In this case, the regular expression on line 4 will match #info anywhere in the URL.

To use this file, you run linkcheck like this:

linkcheck example.com --skip-file my_skip_file.txt

Regular expressions are hard. If unsure, use the -d option to see what URLs your skip file is ignoring, exactly.

To use a skipfile while running linkchecker through docker create a directory to use as a volume in docker and put your skip file in it. Then use a command similar to the following (assuming the folder was named skipfiles):

docker run -v "$(pwd)/skipfiles/:/skipfiles/" filiph/linkcheck http://example.com/ --skip-file /skipfiles/skipfile.txt

User agent

The tool identifies itself to servers with the following user agent string:

linkcheck tool (https://github.com/filiph/linkcheck)

Releasing a new version

  1. Commit all your changes, including updates to CHANGELOG, and including updating the version number in pubspec.yaml and lib/linkcheck.dart. Let's say your new version number is 3.4.56. That number should be reflected in all three files.
  2. Tag the last commit with the same version number. In our case, it would be 3.4.56.
  3. Push to master.

This will run the GitHub Actions script in .github/workflows/release.yml, building binaries and placing a new release into github.com/filiph/linkcheck/releases.

In order to populate it to the GitHub Actions Marketplace as well, it's currently required to manually Edit and hit Update release on the release page once. No changes needed. (Source: GitHub Community)