Open Jin-Sun-tts opened 9 months ago
One tricky situation to be included in the test is that a parent dataset and all its children datasets are to be deleted in one harvest job. In this case we should allow deleting the parent.
I would consider this differently. If a parent dataset is removed without it's children, I would consider 2 scenarios of what a data provider would expect:
Really this situation shouldn't exist if datasets are managed appropriately by data providers, but we can't rely on that. Every response seems like a hack. However, keeping the parent dataset when it's been removed from the source feels wrong.
Not the same, but for reference we do something similar in datagov-dedupe, whereby if we need to remove a duplicate parent, we loop through all the children and make sure it points to the correct new parent. We could just remove the reference to the parent if the parent is deleted...
The child datasets should have their reference to the parent removed, making them normal dataset records (and not associated with a non-existent parent)
As an outsider perspective, I think the second option seems more logical. Given that the child datasets still exist, it would make more sense to keep them and not have the relationship to the missing parent. If data providers intended to delete child datasets, they should be on the hook for it in managing their metadata catalog.
From a data system perspective, I think it makes more sense to ensure comparisons between agency source catalog match what's in the data.gov catalog.
From a user perspective, I can't say I have a good answer.
How to reproduce
Related #4553
in the local environment, find a collection and run
ckan dataset purge/delete <collection_package_id>
.Expected behavior
The expected behavior is that for a collection, it should not be allowed to delete a parent dataset if there are still children datasets associated with it.
Actual behavior
The actual behavior observed is that parent datasets were deleted even when there were other datasets within the same collection.
Sketch
To address this issue, we need to implement logic at the delete stage. If a collection contains children datasets, the system should prevent the deletion of the parent dataset. This will ensure that parent datasets are retained as long as they have associated children datasets.