geosolutions-it / ckanext-faoclh

CKAN extension for FAO CLH
0 stars 1 forks source link

ckanext-faoclh

This extension adds customizations for the FAO-CLH deploy.

Available plugins:

Requirements

CKAN 2.8.4+ (tested with CKAN 2.8.4)

Installation and Update

Activate virtualenv then install the extension, as user ckan:

$ cd /usr/lib/ckan/src/
$ git clone https://github.com/geosolutions-it/ckanext-faoclh ## or this one in case of deployment in the FAO server: git clone https://tdipisa@bitbucket.org/cioapps/ckanext-faoclh.git
$ cd ckanext-faoclh/
$ pip install -e .

To update an already installed faoclh extension, as user ckan:

$ cd /usr/lib/ckan/src/ckanext-faoclh/
$ git pull
$ pip install -e . ## only if required, it depends on the entity of the update 

Activate virtualenv for other eventual installation steps of other involved extensions in the faoclh deploy.

Init DB

The following command is needed for the upload of custom images for vocabulary items

$ paster --plugin=ckanext-faoclh initdb --config=/etc/ckan/default/production.ini

Update the Solr schema

Update the schema.xml file (located at /usr/lib/ckan/src/ckan/ckan/config/solr/schema.xml) with the following xml tags:

Enable multilingual support

Enable multilingual support for datasets, organizations/groups, tags, and resources using the ckanext-multilang extension by following the setup steps described below:

[multilang] Clone and install the extension

[multilang] Configure multilingual support

To add multilingual configurations in CKAN's configuration file production.ini (found at /etc/ckan/default/production.ini), add the following configuration:

[multilang] Initialize the database

Make sure the virtual environment is active before running the command below. See previous steps on how to activate the virtual environment.

$ paster --plugin=ckanext-multilang multilangdb initdb --config=/etc/ckan/default/production.ini

[multilang] Update the Solr schema

Update the schema.xml file (located at /usr/lib/ckan/src/ckan/ckan/config/solr/schema.xml) with the following xml tags:

Enable filtering by "year of release"

To enable filtering of datasets by custom resource field "year of release" follow the steps described below:

$ paster --plugin=ckan search-index rebuild  --config=/etc/ckan/default/production.ini

Initialize database tables

To initialize database tables for the fao-clh extension, follow the steps below.

Activate the virtual environment:

$ . /usr/lib/ckan/default/bin/activate  

Create database tables by running the command below:

$ paster --plugin=ckanext-faoclh initdb --config=/etc/ckan/default/production.ini

Configuring CKAN for CSV export

CKAN allows you to create jobs that run in the ‘background’, i.e. asynchronously and without blocking the main application.

Background jobs can be essential to providing certain kinds of functionality, for example:

Basically, any piece of work that takes too long to perform while the main application is waiting is a good candidate for a background job. Read more about CKAN's background job here

To enable CKAN's background jobs in ckanext-faoclh, create a file name ckan-worker.ini in /etc/supervisord.d/ then copy in the code below.

# =======================================================
# Supervisor configuration for CKAN background job worker
# =======================================================

[program:ckan-worker]
# Use the full paths to the virtualenv and your configuration file here.
command=/usr/lib/ckan/default/bin/paster --plugin=ckan jobs worker --config=/etc/ckan/default/production.ini

user=ckan

# Start just a single worker. Increase this number if you have many or
# particularly long running background jobs.
numprocs=1
process_name=%(program_name)s-%(process_num)02d

# Log files.
stdout_logfile=/var/log/ckan/worker.log
stderr_logfile=/var/log/ckan/worker.err

# Make sure that the worker is started on system start and automatically
# restarted if it crashes unexpectedly.
autostart=true
autorestart=true

# Number of seconds the process has to run before it is considered to have
# started successfully.
startsecs=10

# Need to wait for currently executing tasks to finish at shutdown.
# Increase this if you have very long running tasks.
stopwaitsecs = 600

Create a directory to hold all the generated CSV datasets and grant user 'ckan' permissions to it. You may need root privileges to do that.
Let's say we want to use /var/lib/ckan/export:

mkdir /var/lib/ckan/export
chown ckan: /var/lib/ckan/export

Add the created directory to CKAN configuration file (/etc/ckan/default/production.ini) using the faoclh.export_dataset_dir settings key as shown below

faoclh.export_dataset_dir = /var/lib/ckan/export 

Once the file is created, restart CKAN using the command below:

systemctl restart supervisord

To run asynchronous worker in dev environment using the command below

paster --plugin=ckan jobs worker --config=/etc/ckan/default/production.ini

Enabling CKAN Tracking

To enable page view tracking, follow the steps below:

For operations based on the tracking data CKAN uses a summarised version of the data, not the raw tracking data that is recorded “live” as page views happen. The paster tracking update and paster search-index rebuild commands need to be run periodicially to update this tracking summary data.

You can setup a cron job to run these commands. On most UNIX systems you can setup a cron job by running crontab -e in a shell to edit your crontab file, and adding a line to the file to specify the new job. For more information run man crontab in a shell. Below is a crontab line to update the tracking data and rebuild the search index. As root, in /etc/crontab add line:

0 * * * * ckan /usr/lib/ckan/default/bin/paster --plugin=ckan tracking update -c /etc/ckan/default/production.ini && /usr/lib/ckan/default/bin/paster --plugin=ckan search-index rebuild -r -c /etc/ckan/default/production.ini

From command line:

service crond reload

Retrieving Tracking Data

Run the command below to generate a csv file with tracking data:

paster --plugin=ckan tracking export "/path/to/csv/file/tracking.csv" "2020-01-01" --config=/etc/ckan/default/production.ini 

NOTE: Replace "2020-01-01" with an offset date from which the tracking data will generate.

Tracking data access with Google Analytics

Send tracking data to google analytics using the ckanext-googleanalytics extension by following the steps below.

Note: Your password will probably be readable by other people; so you may want to set up a new Gmail account with 2fa enabled specifically for accessing your Gmail profile.

Enable dataset rating

Enable dataset rating using ckanext-rating by following the steps below.

    $ cd /usr/lib/ckan/src/
    $ git clone https://github.com/geosolutions-it/ckanext-rating.git
    $ cd ckanext-rating/
    $ pip install -e .

TIP: Enabled/disabled ratings for unauthenticated users using rating.enabled_for_unauthenticated_users configuaration key as shown below

rating.enabled_for_unauthenticated_users = true or false

Optionally, list dataset types for which the rating will be shown (defaults to ['dataset']) using the ckanext.rating.enabled_dataset_types settings key.

Enable comments

Enable user commenting functionality on datasets using ckanext-ytp-comments by following the steps below:

    $ cd /usr/lib/ckan/src/
        $ git clone https://github.com/geosolutions-it/ckanext-ytp-comments.git
    $ cd ckanext-ytp-comments/
    $ git checkout faoclh
    $ pip install -e .
        $ pip install -r requirements.txt

Resource Preview plugins

Datastore plugin

Some preview plugins require the data to be stored in the datastore plugin.

Create postgres user and DB:

sudo -u postgres createuser -S -D -R -P -l datastore_default    
sudo -u postgres createdb -O ckan_default datastore_default -E utf-8

Edit CKAN ini file:

Set the permissions on the database:

paster --plugin=ckan datastore set-permissions -c /etc/ckan/default/development.ini | sudo -u postgres psql --set ON_ERROR_STOP=1

Datapusher

The datapusher plugin parses data files and loads the parsed data into the datastore

The datapusher is implemented as an external WSGI service, plus a plugin inside CKAN to interact with it.

Datapusher WSGI application

Datapusher plugin

Enable the datapusher plugin

ckan.plugins = [...] datastore [...] datapusher [...]      

Add the datapusher service URL in the CKAN ini file:

ckan.datapusher.url = http://0.0.0.0:8800/

Preview plugins

In the ckan configuration ini file, make sure there are these plugins in the ckan.plugins line:

PDF view

PDF preview needs an external library.

Create views for datasets

Make sure that in the CKAN ini file the default_views property contains all the views we want to create previews for:

ckan.views.default_views = image_view text_view recline_view pdf_view

If you add plugin views in an already populated CKAN instance, you have to add the missing views to the datasets resources:

Enabling Reporting

Enable reporting of broken Links, tagless dataset, dataset without resources, unpublished datasets.

NOTE: ckanext-faoclh depends on ckanext-report CKAN extension and OWSLib for reporting

Generating reports

Using the command line, you can issue this command to generate all reports:

paster --plugin=ckanext-report report generate -c /etc/ckan/default/production.ini

If you need a single report, use this line::

paster --plugin=ckanext-report report generate $report-name -c /etc/ckan/default/production.ini

NOTE: The command can take a while to produce results. Especially broken-links report may take a significant amount of time because it will check each resource for availability.

Setting up a cron job to generate reports

In order to have reports regularly generated, you may want to run the previous command via cron.

Edit file /etc/crontab and add the line

0  *    * * *   ckan    /usr/lib/ckan/default/bin/paster --plugin=ckanext-report report generate -c /etc/ckan/default/production.ini

You may alter the job periodicity at will; the current value will generate reports at midnight every day.

Then have cron reload its configuration file:

service cron reload

Accessing reports

You can navigate to /report route in the CKAN user interface to view the generated reports.

Loading initial data

These steps are needed to load initial groups, organizations, dataset, vocabularies.

This initial setup is only needed one time, when the app is deployed for the first time.

Load default groups

Enter in the bin/ directory.

Run

./load_groups.sh SERVER_URL API_KEY

E.g.

./load_groups.sh http://10.10.100.136 b973eae2-33c2-4e06-a61f-4b1ed71d277c

In order to remove the groups:

./purge_groups.sh SERVER_URL API_KEY

Please note that groups image names changed over time, so if you already have your groups and the images are not properly loaded, please consider editing the groups info and setting the filenames according to the actual files.

Load default organizations

Enter in the bin/ directory.

Run

./load_orgs.sh SERVER_URL API_KEY

E.g.

./load_orgs.sh http://10.10.100.136 b973eae2-33c2-4e06-a61f-4b1ed71d277c   

Load vocabularies

The default vocabulary files are in init/vocab/.

Make sure the virtualenv is active, and then load the vocabularies (double check and fix the vocab paths):

Next lines are about an old file-based vocabularies handling. They are only valid if you didn't edit your vocab items in the CKAN GUI.

If you need to update the vocabulary, edit the file and run the vocab load command again; the command will add and remove the related tags as needed.

If you need to completely remove a vocabulary, you can run:

$ paster --plugin=ckanext-faoclh vocab delete -n VOCAB_NAME --config=/etc/ckan/default/production.ini

for instance

$ paster --plugin=ckanext-faoclh vocab delete -n fao_resource_type --config=/etc/ckan/default/production.ini

Load datasets

Enter in the bin/ directory.

Run

./load_datasets.sh SERVER_URL API_KEY

E.g.

./load_dataset.sh http://10.10.100.136 b973eae2-33c2-4e06-a61f-4b1ed71d277c

This step requires that groups and organizations have already been created.

Further setup

CKAN by default does not clean up the session cache files. Cache files are stored in a subdir of the /tmp direcotory; If your server is not rebooted every few days, the session files may fill up the inode space, and the system may become unstable.

Edit file /etc/crontab and add the line

0  *    * * *   ckan    find /tmp/faoclh/sessions/ -mmin +1440 -type f -print -exec rm {} \;

Then have cron reload its configuration file:

service cron reload