DataShades / ckanext-ingest

GNU Affero General Public License v3.0
0 stars 1 forks source link

Might want to use package_patch instead of package_update #3

Open canon-cmre-kym-eden opened 1 month ago

canon-cmre-kym-eden commented 1 month ago

I had unexpected loss of all resources from a project while building an ingestion pipeline that could update the dataset metadata. Turns out there is a patch variant of package and resource update: https://docs.ckan.org/en/2.9/api/#ckan.logic.action.update.package_update

Might be worth changing the update to be non-destructive, or introduce a multi-state (enum) parameter: https://github.com/DataShades/ckanext-ingest/blob/e6df1ff5df0e672280c9e1b18009aee38f3863d2/ckanext/ingest/record.py#L38-L40

https://github.com/DataShades/ckanext-ingest/blob/e6df1ff5df0e672280c9e1b18009aee38f3863d2/ckanext/ingest/record.py#L96-L100

Thanks

smotornyuk commented 1 month ago

Yep, great idea. I always thought about ingested packages as something uncontrolled that comes from the outside world.

But a combination of local data with ingested information definitely a common usecase, so patch instead of update is a must-have feature at least:

The current behavior with unconditional reset to some specific state of the dataset is also sensible in certain scenarios(in my projects, at least:)), so I'll keep this option.

I'll probably add the following change for now:

action = "package_" + ( 
   "update" if pkg and self.options.get("update_existing") else "create" 
)
# will be changed to => 
update_strategy = self.options.get("update_strategy", "update")

action = "package_" + ( 
   update_strategy if pkg and self.options.get("update_existing") else "create" 
)

To keep it compatible with existing code. Everything remains unchanged by default, and you can set update_strategy: "patch" record option to use package/resource patch

For the v2 of the extension, I'll think of a better interface.