Open etj opened 4 years ago
@etj I have some experience with doing this, using CKAN's WAF harvesting capability. Our organization runs CKAN on a dedicated server. We have a set of github repositories with ISO19139 metadata files cloned on the server, in the folder where Apache can host files (cloned to /var/www/html). We manually set up our WAF harvesting sources in CKAN, pointing to the local URLs that Apache provides, and we run a cron job that detects updates to each github repository, pulls the changes when they are detected, and starts a harvesting job on the changed WAF folder.
@bonnland thanks for sharing! Anyway pls consider that harvesting from the filesystem should be somewhat faster, and you don't need to configure an httpd service to publish the docs.
@etj I agree. We get a harvesting rate of about 2 per second, which means that tens of thousands of records can take several hours. Faster would be nice.
We may want to harvest a set of existing ISO19139 metadata we already have, but that are not served by a CSW server. In this case we'd put the metadata files inside a directory in the local filesystem, and harvest them from there. This harvester would also be useful in case the remote server can only serve metadata in push mode: the server running CKAN would offer an exchange directory, where the remote server would put its files, and where this harvester would harvest from.