.. contents:: Table of Contents
ftw.linkchecker
is an add-on for Plone installations. It is designed to be run
as a cronjob regularly to find and report broken links and references within Plone sites.
How it works
Important note
You should be careful not to activate this script as a cronjob in environments where the ZEO server could be legitimately down for long periods (e.g. Staging / Test servers) as it could lead to locking up/crashing the entire machine.
See non-production-info_ for more information.
Plone 4.3.x
::
[instance]
eggs +=
...
ftw.linkchecker
A JSON settings file is required (see below for an example). The following options can be configured in the settings file per platform:
ftw.simplelayout
file listing
block) where the report File
will additionally be uploaded.::
{
"/plone1": {
"email": ["first_site_admin@example.com", "first_site_keeper@example.com"],
"base_uri": "http://example1.ch",
"timeout_config": "1",
"upload_location": "/content_page/my_file_listing_block"
},
"/folder/plone2": {
"email": ["second_site_admin@example.com"],
"base_uri": "http://example2.ch",
"timeout_config": "1"
}
}
Run command for ftw.linkchecker.
::
bin/instance check_links /path/to/settings.json [-l /path/to/logfile.log] [-p processes]
-l
or --logpath
) is
the path to a logfile (which was created in advance).-p
or --processes
) is
the maximal number of processes spawned for the head requests.ln -s development.cfg buildout.cfg
python bootstrap.py
bin/buildout
Run bin/test
to test your changes.
Or start an instance by running bin/instance fg
.
This package is copyright by 4teamwork <http://www.4teamwork.ch/>
_.
ftw.linkchecker
is licensed under GNU General Public License, version 2.
.. _non-production-info:
Do not run in non-production
In development bin/instance is (usually) the Plone server. In other setups, bin/instance is a so called ZEO client. A ZEO client will, instead of directly opening a Data.fs, access the ZEO server over the network. In our setups, this is wired up via ftw-buildouts.
Now, if the ZEO server cannot be reached (not running, network issues, misconfiguration, ...), the ZEO client will sleep for a bit, and try to reconnect. By default, it does this in an infinite loop and it will try to reconnect to the mothership until the end of time. For the regular instances (ZEO clients) running in supervisor, this is the ideal behavior: If the ZEO server temporarily cannot be reached, the clients will try to reconnect all by themselves. If the ZEO server comes back up again, the system will fix itself without any need for intervention.
However, when using bin/instance from cronjobs, this can lead to a problem. If at any given time the ZEO server cannot be reached (for whatever reason - accidentally stopped, misconfigured, network problems, ...), the client invoked by the cron job will attempt to reconnect forever. Therefore that script will never terminate (and return control to the shell). Instead it will keep running, and the next day (or whenever the cron job gets executed the next time), a new instance will be invoked, which will also hang.
So every night another "hanging" process that's stuck in an infinite loop will be added. These can accumulate quickly, and lead to server-wide resource issues. One might hit limits like max max number of open file descriptors, number of processes per user, server memory, high load, max number of open sockets, ... If a situation like this ever happens, it's basically a matter of time until that entire server goes down (unless someone recognizes the issue and fixes it).
Therefore there's at least a caveat when configuring cron jobs to run scripts like this. It doesn't necessarily mean it shouldn't be done, but it comes with an operational risk that's somewhat tricky to manage.