Closed sjbruce closed 1 month ago
I should note that the harvester configuration above is a direct lift from the harvester configuration from a 1.5.0 deployment of CKAN
is the ckan_run_harvester container running? Are the cron jobs in this container executing?
you can run the harvester cleanup manually by executing ckan --config=/srv/app/ckan.ini harvester run
or by clocking 'stop' in the gui.
see /contrib/docker/crontab for a list of cron jobs that are run in the ckan_run_harvester container
It could be related to container permissions. the ckan_run_harvester must be run as root.
ckan_run_harvester
is running but there don't appear to be any cron jobs running or indeed scheduled.
The docker file does have a line to copy the crontab
file to the container and it is in /srv/app/src/ckan/contrib/docker
but if I look at /etc/crontabs/root
it simply lists the instructions to run cron jobs in /etc/periodic/
sub-directories, all of which are empty.
It doesn't look like the cron jobs are installed.
Running the command above it complains about "SECRET_KEY" which likely makes part or all of this down to not running the ckan generate config command and grabbing the appropriate key values or executing the commented out commands at the top of the .env file.
I note that those commands will fail on Windows/WSL due to some low-level nonsense on that part. I'll work around it and rebuild the containers to see if that makes a difference.
I imagine it'll let the command above run, I don't suspect it'll change anything with the cron jobs themselves.
There is a couple of issues here.
line 20 in ckan-run-harvester-entrypoint.sh should be cat /srv/app/src/ckan/contrib/docker/crontab | crontab -
while ckan can read it's config from environment variables the command line tools do not. so in order for all the cronjob tasks to work we need to update the ckan.ini.
uncomment the following lines in your ckan.ini in the container
ckan.plugins = envvars
stats
text_view
image_view
recline_view
datastore
datapusher
scheming_datasets
scheming_organizations
scheming_groups
scheming_nerf_index
fluent
harvest
ckan_harvester
csw_harvester
waf_harvester
doc_harvester
ckan_schema_harvester
spatial_metadata
spatial_query
spatial_harvest_metadata_api
cioos_harvest
cioos_theme
ckan_cioos_harvester
dcat
structured_data
resource_proxy
geo_view
geojson_view
wmts_view
ckan_spatial_harvester
datastream_harvester
#geonetwork_harvester
# module-path:file to schemas being used
scheming.dataset_schemas = ckanext.scheming:cioos_siooc_schema.json
scheming.presets = ckanext.scheming:presets.json
ckanext.fluent:presets.json
scheming.dataset_fallback = true
scheming.organization_schemas = ckanext.scheming:organization.json
scheming.group_schemas = ckanext.scheming:group.json
It is odd that the fetch and gather containers work while the run container does not... This config settings issue would also account for odd indexing problems.
Note that there appears to be some odd behaviour when updating the frequency of a harvest job. While the change will show up in the GUI after hitting save. the time of the next harvest job run is not adjusted in the database until the next time it runs. This means that when going from weekly to always frequency, for example, the job will not be updated until the next time it runs, potentially in a week. To update sooner you will need to manually run the harvest to insure the database is updated to the new settings.
CKAN version 1.6.0
Describe the bug Harvest jobs of fresh installs of CKAN 1.6.0 do not appear to be able to terminate by themselves as previous versions do.
Current job has been running for well over an hour, but it has inserted all datasets correctly.
However, the process appears to fail before the indexes are updated as the home page shows a dataset count of zero and no E*Vs are listed as having any datasets attached to them.
The datasets page does show the datasets, E*Vs, responsible organizations, tags, resources types, licenses, formats.
Map will show dataset extents and filters appear to be working properly.
Log outputs for the ckan and harvester containers are attached.
Steps to reproduce Steps to reproduce the behavior:
Expected behavior The harvester should have run and produced a set of results detailing how many datasets added, updated, deleted, etc.
Additional details
Configuration:
CKAN Container & Harvester Logs:
ckan.log ckan_harvesters.log