This python script scrapes all the license files and automates the task of detecting broken links, timeout error and other link issues
There are two suggested ways of installation. Use User, if you are interested in just running the script. Use Development, if you are interested in developing the script
git clone https://github.com/creativecommons/cc-link-checker.git
pipenv install
We recommend using pipenv to create a virtual environment and install dependencies
git clone https://github.com/creativecommons/cc-link-checker.git
pipenv install --dev
sync
to install last successful environment. For example:
pipenv sync --dev
pipenv run link_checker
pipenv run link_checker -h
usage: link_checker [-h] {deeds,legalcode,rdf,index,combined,canonical} ...
Check for broken links in Creative Commons license deeds, legalcode, and rdf
optional arguments:
-h, --help show this help message and exit
subcommands (a single subcomamnd is required):
{deeds,legalcode,rdf,index,combined,canonical}
deeds check the links for each license's deed
legalcode check the links for each license's legalcode
rdf check the links for each license's RDF
index check the links within index.rdf
combined Combined check (deeds, legalcode, rdf, and index)
canonical print canonical license URLs
Also see the help output each subcommand
pipenv run link_checker deeds -h
usage: link_checker deeds [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker legalcode -h
usage: link_checker legalcode [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker rdf -h
usage: link_checker rdf [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--local-index] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker index -h
usage: link_checker index [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local-index] [--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker combined -h
usage: link_checker combined [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--local-index]
[--output-errors [output_file]]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer (default: 10)
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine
valid license paths (uses LICENSE_LOCAL_PATH environment
variable and falls back to default:
'../creativecommons.org/docroot/legalcode')
--local-index process local filesystem index.rdf (uses
INDEX_RDF_LOCAL_PATH environment variable and falls back
to default: './index.rdf')
--output-errors [output_file]
output all link errors to file (default: errorlog.txt) and
create junit-xml type summary (test-summary/junit-xml-
report.xml)
pipenv run link_checker canonical -h
usage: link_checker canonical [-h] [-q] [--root-url ROOT_URL] [--limit LIMIT] [-v]
[--local] [--include-gnu]
optional arguments:
-h, --help show this help message and exit
-q, --quiet decrease verbosity (can be specified multiple times)
--root-url ROOT_URL set root URL (default: 'https://creativecommons.org')
--limit LIMIT Limit check lists to specified integer
-v, --verbose increase verbosity (can be specified multiple times)
--local process local filesystem legalcode files to determine valid
license paths (uses LICENSE_LOCAL_PATH environment variable
and falls back to default:
'../creativecommons.org/docroot/legalcode')
--include-gnu include GNU licenses in addition to Creative Commons
licenses
Due to the script capability to scrape licenses from local storage, it can be used as CI in 2 easy steps:
Clone this repo in the CI container
git clone https://github.com/creativecommons/cc-link-checker.git ~/cc-link-checker
Run the link_checker.py
in local(--local
) and
output error(--output-error
) mode
python link_checker.py --local --output-errors
The configuration for GitHub Actions, for example, is present here.
Unit tests have been written using pytest framework. The tests can be run using:
pipenv install --dev --python /usr/local/opt/python@3.7/libexec/bin/python
pipenv install --dev
pipenv run pytest -v
UnicodeEncodeError
:
This error is thrown when the console is not UTF-8 supported.
Failing Lint build:
Ensure style/syntax is correct:
pipenv run black .
pipenv run isort .
pipenv run flake8 .
The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.
We welcome contributions for bug fixes, enhancement and documentation. Please see CONTRIBUTING.md
while contributing..