cioos-siooc / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
http://ckan.org/
Other
2 stars 4 forks source link

fix crontab in ckan_run_harvester #229

Closed fostermh closed 1 month ago

fostermh commented 1 month ago

fix https://github.com/cioos-siooc/ckan/issues/227 update config so command line tools can load extensions. fix crontab loading in ckan_run_harvester

github-actions[bot] commented 1 month ago

Image has been pushed to cioos/ckan

Testing Quick Start

Pull image

sudo docker pull cioos/ckan:DEV_PR229 or sudo CKAN_TAG=DEV_PR229 docker-compose pull ckan

Remove Home Volume and Restart

sudo docker-compose down
sudo docker volume rm docker_ckan_home
sudo CKAN_TAG=DEV_PR229 docker-compose up -d

for full documentation see TBD

sjbruce commented 1 month ago

For reasons unknown, this is what's happening now.

My guess is that we may need to downgrade the cryptography library to 38.0.4 - from looking around and testing that's that latest version that can support that import call.

Similar error to this one: https://github.com/apache/superset/discussions/22613

2024-07-12 11:07:46 Postgres is up - executing command
2024-07-12 11:07:47 [prerun] Initializing or upgrading db - start
2024-07-12 11:07:47 Traceback (most recent call last):
2024-07-12 11:07:47   File "/srv/app/prerun.py", line 102, in init_db
2024-07-12 11:07:47     subprocess.check_output(db_command, stderr=subprocess.STDOUT)
2024-07-12 11:07:47   File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
2024-07-12 11:07:47     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2024-07-12 11:07:47   File "/usr/lib/python3.9/subprocess.py", line 528, in run
2024-07-12 11:07:47     raise CalledProcessError(retcode, process.args,
2024-07-12 11:07:47 subprocess.CalledProcessError: Command '['ckan', '-c', '/srv/app/ckan.ini', 'db', 'upgrade']' returned non-zero exit status 1.
2024-07-12 11:07:47 
2024-07-12 11:07:47 During handling of the above exception, another exception occurred:
2024-07-12 11:07:47 
2024-07-12 11:07:47 Traceback (most recent call last):
2024-07-12 11:07:47   File "/srv/app/prerun.py", line 215, in <module>
2024-07-12 11:07:47     init_db()
2024-07-12 11:07:47   File "/srv/app/prerun.py", line 105, in init_db
2024-07-12 11:07:47     if "OperationalError" in e.output:
2024-07-12 11:07:47 TypeError: a bytes-like object is required, not 'str'
2024-07-12 11:07:47 /srv/app/start_ckan.sh: Running init file /docker-entrypoint.d/ckan-entrypoint.sh
2024-07-12 11:07:47 db:5432 - accepting connections
2024-07-12 11:07:48 Traceback (most recent call last):
2024-07-12 11:07:48   File "/usr/bin/ckan", line 8, in <module>
2024-07-12 11:07:48     sys.exit(ckan())
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 829, in __call__
2024-07-12 11:07:48     return self.main(*args, **kwargs)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 781, in main
2024-07-12 11:07:48     with self.make_context(prog_name, args, **extra) as ctx:
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 700, in make_context
2024-07-12 11:07:48     self.parse_args(ctx, args)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/cli/cli.py", line 116, in parse_args
2024-07-12 11:07:48     result = super(ExtendableGroup, self).parse_args(ctx, args)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 1212, in parse_args
2024-07-12 11:07:48     rest = Command.parse_args(self, ctx, args)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 1048, in parse_args
2024-07-12 11:07:48     value, args = param.handle_parse_result(ctx, opts, args)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 1630, in handle_parse_result
2024-07-12 11:07:48     value = invoke_param_callback(self.callback, ctx, self, value)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/click/core.py", line 123, in invoke_param_callback
2024-07-12 11:07:48     return callback(ctx, param, value)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/cli/cli.py", line 126, in _init_ckan_config
2024-07-12 11:07:48     _add_ctx_object(ctx, value)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/cli/cli.py", line 135, in _add_ctx_object
2024-07-12 11:07:48     ctx.obj = CtxObject(path)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/cli/cli.py", line 57, in __init__
2024-07-12 11:07:48     self.app = make_app(self.config)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/config/middleware/__init__.py", line 56, in make_app
2024-07-12 11:07:48     load_environment(conf)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/config/environment.py", line 123, in load_environment
2024-07-12 11:07:48     p.load_all()
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/plugins/core.py", line 165, in load_all
2024-07-12 11:07:48     load(*plugins)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/plugins/core.py", line 179, in load
2024-07-12 11:07:48     service = _get_service(plugin)
2024-07-12 11:07:48   File "/srv/app/src/ckan/ckan/plugins/core.py", line 281, in _get_service
2024-07-12 11:07:48     return plugin.load()(name=plugin_name)
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 2443, in load
2024-07-12 11:07:48     return self.resolve()
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/pkg_resources/__init__.py", line 2449, in resolve
2024-07-12 11:07:48     module = __import__(self.module_name, fromlist=['__name__'], level=0)
2024-07-12 11:07:48   File "/srv/app/src/ckanext-harvest/ckanext/harvest/harvesters/__init__.py", line 1, in <module>
2024-07-12 11:07:48     from ckanext.harvest.harvesters.ckanharvester import CKANHarvester
2024-07-12 11:07:48   File "/srv/app/src/ckanext-harvest/ckanext/harvest/harvesters/ckanharvester.py", line 7, in <module>
2024-07-12 11:07:48     from urllib3.contrib import pyopenssl
2024-07-12 11:07:48   File "/usr/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py", line 53, in <module>
2024-07-12 11:07:48     from cryptography.hazmat.backends.openssl.x509 import _Certificate
2024-07-12 11:07:48 ModuleNotFoundError: No module named 'cryptography.hazmat.backends.openssl.x509'
fostermh commented 1 month ago

hmmm I thought we fixed this one. I will check my package versions

fostermh commented 1 month ago

it seems like the command line tools need the .plugins list in the ckan.ini however. perhaps we need to write to the ckan.ini on container start or something

fostermh commented 1 month ago

I beleave with the addition of the ckan_home volume to the ckan_run_harvester this issue is now fixed. please confirm and merge if it is working.

sjbruce commented 1 month ago

The harvester now runs well without any external intervention but the entrypoint file still doesn't seem to be able to setup the cronjobs on its own.

However, I have found that if I manually execute command to setup the crontab from within the container via an interactive shell then, the cron jobs will populate and then execute properly. The container loses the cronjobs if they are rebuilt though...

A thought occurs - could we just mount the crontabs file as a volume like what we're doing with the entrypoint files? I'll test that out on my local to see how it performs.

sjbruce commented 1 month ago

Mounting the crontab file as a volume in the ckan_run_harvester container appears to work, however, it needs to be owned by root:root in order to execute (thankfully the user id for root is universal)

fostermh commented 1 month ago

I pushed a fix for the crontab that does not require mounting. the entrypoint file was not being run as it is only run on the first container start if located under docker-entrypoint.d and not run again until attached volumes are cleared. this is great for the other containers but not this one.