GNU General Public License v2.0
This repository contains a library for Extract, Transform and Load processes for

You can report issues with current transformations, or suggest sources which should be added to this library using the GitHub issue tracker.


Each process, located in the process folder consists of a collection of files that either (a) document a manual transformation of the data; or (b) perform an automated transformation.

Folders may contain:

The output of each process should be written to the root data/ folder, from where it can be loaded onto the platform.

Running ETL Dashboard (without docker)

The project acts as an extension module for First clone and test that project before using this one.

Then, assuming you have a common folder for the two git clones (i.e. this repo can be found at ../resource-projects-etl relative to cove), perform these steps from within the cove folder:

cp ../resource-projects-etl/requirements_taglifter.txt ./
pip install -r requirements_taglifter.txt 
cp ../resource-projects-etl/requirements.txt ./
pip install -r requirements.txt 
cp -R ../resource-projects-etl/modules ./
cp -R ../resource-projects-etl/ ./
python install
mkdir db
export DB_NAME=./db.sqlite
python migrate --noinput
python compilemessages
python collectstatic --noinput
cp -R ../resource-projects-etl/ontology ./ontology
gunicorn cove.wsgi -b --timeout 600 -w 3 -k eventlet

Running ETL Dashboard with docker

Running from docker hub

You will need virtuoso container running.

docker rm -f rp-etl
docker run --name rp-etl --link virtuoso:virtuoso -p -e DBA_PASS=dba opendataservices/resource-projects-etl

Update DBA_PASS as appropriate.

Then visit http://locahost:8000/

Full deployment (with data staging and live frontends)

OpenDataServices dev deploy can be found at (this is a SaltStack state file).

For a live deploy, running docker directly (you probably don't want to do this, but the below commands should be translatable to your preferred deployment approach), you could do:

# Create the volume containers
docker create --name virtuoso-data -v /usr/local/var/lib/virtuoso/db opendataservices/virtuoso:live
docker create --name etl-data -v /usr/src/resource-projects-etl/db -v /usr/src/resource-projects-etl/src/cove/media opendataservices/resource-projects-etl:live

# Run the containers
# Virtuoso
docker run -p --volumes-from virtuoso-data --name virtuoso opendataservices/virtuoso:live
docker run -p --link virtuoso:virtuoso -e "DBA_PASS=dba" -e FRONTEND_LIVE_URL= -e FRONTEND_STAGING_URL= --volumes-from etl-data opendataservices/resource-projects-etl:live
# Frontend (Live)
docker run  -p --link virtuoso:virtuoso-live -e BASE_URL=  -e SPARQL_ENDPOINT=http://virtuoso-live:8890/sparql -e DEFAULT_GRAPH_URI= opendataservices/
# Frontend (Staging)
docker run -p --link virtuoso:virtuoso-staging -e BASE_URL=  -e SPARQL_ENDPOINT=http://virtuoso-staging:8890/sparql -e DEFAULT_GRAPH_URI= opendataservices/

# Perform initial virtuoso setup
# (this needs running from the directory containing `virtuoso_setup.sql`)
cat virtuoso_setup.sql |  docker run --link virtuoso:virtuoso -i --rm opendataservices/virtuoso:live isql virtuoso

If BASE_URL does not match the URL the sites are exposed at, site navigation won't work correctly. Similarly for the etl container, FRONTEND_LIVE_URL and FRONTEND_DEV_URL should be relevant deployed urls.

On the other hand, SPARQL_ENDPOINT, DEFAULT_GRAPH_URI and the contents of virtuoso_setup.sql, should be left exactly as they are here. (SPARQL_ENDPOINT relates to urls that are wired up inside the docker container by --link, whereas DEFAULT_GRAPH_URI and the contents of virtuoso_setup.sql are virtuoso's internal URI's, and don't relate to the URL the site is actually accessible at).

The above commands expose on 8890, 8801, 8080 and 8081 on localhost. Edit these to match what you want, or place a reverse proxy in front of them.

You should update the virtuoso admin password - through the virutoso HTTP user interface, and then in the DBA_PASS environment variable passed to the ETL container.

To get more recent builds than live, replace :live with :master in the above.

Performing database migrations

Run this against the etl container (you will need to replace etl with the name of your conatiner):

docker exec etl migrate 

Backup data

docker run --volumes-from etl-data -v $(pwd):/backup opendataservices/virtuoso:master tar cvzf /backup/etl-data.tar.gz /usr/src/resource-projects-etl/db /usr/src/resource-projects-etl/src/cove/media


docker run -it --volumes-from etl-data -v $(pwd):/backup opendataservices/virtuoso:master tar xvzf /backup/etl-data.tar.gz -C /

Building docker image

docker build -t opendataservices/resource-projects-etl .

Then run as described above. (You may want to use a different name for your own image, so as not to get confused with those actually from docker hub).

Running taglifter locally



virtualenv .ve --python=/usr/bin/python3
source .ve/bin/activate
pip install -r requirements.txt

You will then have some data as Turtle in the data/ directory.


