ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Harvest interface incompatible with non JSON config #537

Open pdekraker-epa opened 11 months ago

pdekraker-epa commented 11 months ago

I tried creating a harvester and wanted to store the config in YAML instead of JSON. My validate config function works properly checking the items and returning a string of YAML. However before the config is written to the database the harvest_source_extra_validator function runs. It attempts to load the config as JSON. When the json.loads fails the code deletes the config before continuing without an error (it does leave a message in the log). Thus to the user it appears the harvest source was created, but the user supplied configuration is lost.

The specification for the config of a harvest object does not appear to require JSON, the only comment about it I noticed is that the CKANharvester stores its config in JSON.

In exploring this issue I have a few other connected observations:

  1. The harvester interface has an optional method extra_schema that does not appear to be documented
  2. The harvest_source_extra_validator appears to use the extra_schema and add the defined keys into the config (leading to the issue)
  3. the extra_schema fields appear to still be stored in the extras, which kind of makes the inserting into config unnecessary?