Open rossjones opened 9 years ago
Totally agree. The CKAN harvester needs to be refactored to use package_search
on the remote CKAN instead of the old REST API anyway. Once this is done it would be a matter to pass extra filters on the source config.
I've started working on something with the same goal but using a different approach. I'm not using package_search
, I've just added a new extension point.
Anyway, using package_search
seems like the way to go to me too. And I'm sure that using the v3 API (where package_search
belongs to) would also help solving other issues.
@filipefigcorreia we did the same sort of thing (but this time just using config for organization_filter_include/organization_filter_exclude) because of time constraints - https://github.com/datagovuk/ckanext-harvest/commit/01fdbbf682c007a38e98c06065a48b5b8addbe65
It works, but I think it's a little inelegant just because it is doing more work than it really needs to at runtime (we can only include/exclude after we've fetched) - I think a move to v3 of the API and a way to pass filters to the search would be much more efficient longer term.
PR for this: https://github.com/ckan/ckanext-harvest/pull/168
It would be very useful if there was a way of telling the CKAN harvester how to limit the datasets it harvests. For instance, by a specific extra, or the presence of the dataset in a specific organisation.