ckan / ckanext-harvest

Remote harvesting extension for CKAN
129 stars 202 forks source link

Integrate ckan_harvester with ckanext-scheming #153

Open florianm opened 8 years ago

florianm commented 8 years ago

The default ckan_harvester will run into trouble if the harvesting ckan has a custom ckanext-scheming schema. The incompatibility lies with the handling of extra fields: Scheming uses extra fields to store its custom fields. The ckan_harvester on the other hand creates/overrides extra fields as found on the harvested instance. However, scheming's package_read templates will fail if any non-defined extra fields are present.

Pinging @amercader and @wardi for advice:

Assuming an unknown schema A (default, hard-coded like data.gov.au, or ckanext-scheming) is harvested into a custom ckanext-scheming schema B using the ckan_harvester, there are two overlapping sets of fields present:

Would it make sense to modify the ckan_harvester's behaviour around extra fields as follows:

wardi commented 8 years ago

It should be possible to add this sort of logic to a custom harvester, right?

Also, sites using ckanext-scheming will advertise the schemas they have installed through actions scheming_dataset_schema_list and scheming_dataset_schema_show so it's possible to query the schema in use on both ends instead of checking the config.

florianm commented 8 years ago

thanks, I'll try my luck on a custom harvester using dataset_schema_list and _show!

florianm commented 8 years ago

Still running into #151 and no idea how to debug that one... ckanapi looking better and better.