Filesystem harvester - Githubissues

ckan / ckanext-spatial

Geospatial extension for CKAN

http://docs.ckan.org/projects/ckanext-spatial

126 stars 193 forks source link

Filesystem harvester #228

Open etj opened 4 years ago

etj commented 4 years ago

We may want to harvest a set of existing ISO19139 metadata we already have, but that are not served by a CSW server. In this case we'd put the metadata files inside a directory in the local filesystem, and harvest them from there. This harvester would also be useful in case the remote server can only serve metadata in push mode: the server running CKAN would offer an exchange directory, where the remote server would put its files, and where this harvester would harvest from.

bonnland commented 4 years ago

@etj I have some experience with doing this, using CKAN's WAF harvesting capability. Our organization runs CKAN on a dedicated server. We have a set of github repositories with ISO19139 metadata files cloned on the server, in the folder where Apache can host files (cloned to /var/www/html). We manually set up our WAF harvesting sources in CKAN, pointing to the local URLs that Apache provides, and we run a cron job that detects updates to each github repository, pulls the changes when they are detected, and starts a harvesting job on the changed WAF folder.

etj commented 4 years ago

@bonnland thanks for sharing! Anyway pls consider that harvesting from the filesystem should be somewhat faster, and you don't need to configure an httpd service to publish the docs.

bonnland commented 4 years ago

@etj I agree. We get a harvesting rate of about 2 per second, which means that tens of thousands of records can take several hours. Faster would be nice.