ckan / ckan-docker

Scripts and images to run CKAN using Docker Compose
89 stars 173 forks source link

Datapusher / Xloader / ... #95

Open amercader opened 10 months ago

amercader commented 10 months ago

The current default compose setup runs DataPusher to get data automatically into the DataStore. We should probably work towards defaulting to Xloader though, and making easier to integrate other alternatives like Datapusher-Plus. There are different steps we can take towards that, which can be done separately

1. Decoupling Datapusher

The way in which DataPusher is installed now is not very flexible, as there are commands hardcoded in start_ckan.sh, and it is assumed that the datapusher plugin will be in the enabled plugins. This means that users that don't want to use it need to override the whole start_ckan.sh file (or also like if for instance you enable the expire_api_token plugin and you need to add extra params to the user token add command used).

A good initial step would be to 1) remove the datapusher from the default plugins and 2) consolidate all setup commands in a docker-entrypoint.d/01_setup_datapusher.sh file. This could look like

#!/bin/sh

if [ ! -z "$CKAN_DATAPUSHER_URL" ] ; then
   # Setup datapusher
   # Set api token
   # Add plugin to ini file
fi

This way it's easy to turn it off completely and if you need to tweak the setup commands you just need to override this file on your setup. I think this file should live in ckan-docker, not in ckan-docker-base.

2. Using xloader

To use Xloader instead, we need to 1) install the extension 2) Run the worker process 3) Configure it and enable the plugin. Looking at the wiki page it seems that the suggestion is to install xloader and run the process in the same container that the web ckan one? It's hard to tell because I couldn't find the source for the ckan/ckan-base-xloader image.

In any case, perhaps we can either run the worker in the same ckan container (using supervisor to manage the process) or add a separate service in the compose setup that runs the ckan worker process. That could use exactly the same image but with a different command (ckan jobs worker instead of uwsgi or ckan run, which could be handled in start_ckan.sh using a CKAN_WORKER env var or something similar).

Of course ckanext-xloader needs to be installed. We could do it on the ckan-docker-base image but perhaps is better to add the commands in this repo, in ckan/Dockerfile.

I think 1 is a good change to introduce, and I'm open to more suggestions on 2, but keen to hear your thoughts @kowh-ai

amercader commented 10 months ago

@kowh-ai I had a go at point 1 in these two PRs, let me know what you think:

https://github.com/ckan/ckan-docker-base/pull/32 https://github.com/ckan/ckan-docker/pull/97