Closed rufuspollock closed 4 years ago
I will add some thoughts.
The current implementation of the mapper doesn't return a tag
key if there are not tags. This will cause the Core CKAN dictionary to not be updated correctly and always show the current tags if previous revisions don't have tags. This can be fixed quickly however it raises some other concerns.
Moved to #35 as a bigger, broader question.
FIXED. I believe we have now fixed this with final work in frictionless-ckan-mapper.
I'm opening this issue as a central place to discuss and finalize the "agreement" and algorithm for ckan <=> metastore (lib). It will help address some of the confusions e.g. in #21.
Desired outcomes and core principles:
package_show
in CKAN is unchanged and follows CKAN metadata structuremetastore
uses Frictionless (f11s) metadata structuredatastore_active
package_show
has turned into a massive method that includes pretty much everything as a convenient way for users to get all the info they want (cf graphql ideas).datastore_active
andtracking_summary
are clearly not things that would get revisioned i think (or are part of a dataset's core metadata)What is CKAN's package metadata?
A little confusingly CKAN has (at least) two methods for converting a Dataset/Package to a dictionary:
package_show
andas_dict
as_dict
does basic object conversion with some minimal references e.g. for tags, license etc https://github.com/ckan/ckan/blob/2.8/ckan/model/package.py#L195-L223package_show
is much more comprehensive and includes a lot of other information including much information that is not revisionedThe difficulty is that extensions plug into
package_show
andpackage_show
clear has information that is not really part of the package and should not be revisioned.Questions
frictionless_ckan_mapper
works withckanext-scheming
?Approach
The pure way to go would be for ckanext-versioning to:
package.as_dict
rather thanpackage_show
into metastorepackage_show
.Mapping package_show as a whole is not recommended as we end up with lots of noise in extras. If you do want to map raw package_show we can prob do some more work to remove the noise in extras.
Here's some of the code we are writing to update CKAN's package dictionary with
metastore-lib
metadata: https://github.com/datopian/ckanext-versioning/blob/58cada2193c5190fed1d00c7382810b07ae8e756/ckanext/versioning/datapackage.py#L114Being forgiving: if a user does store package_show let's convert it back for them
What if someone does just want to dump
package_show
? Can we make sure we intelligently convert it back for them? I think the answer is yes ...Tag handling
issue b/c what happens if tags don't exist in DB anymore (but did exist in old revision) we will then show tags with links on showcase @ revision X that "don't work"
2) Side Effects core package_show and other stuff
package_show has some side effects like setting context['package'] = pkg that we will need to address and make me think that it's gonna be difficult to avoid. Another thing is that package_create calls package_show at the end, so the way I was hooking the logic is not gonna be possible since it will try to look at a github repository before we even create one since I need the pkg_dict to do a proper conversion between CKAN <-> DataPackage (the data_dict argument of package_create is not enough)
3) Using Interfaces
Point 2 will require to make a logic like we did in ckanext-gitdatahub and hook all the metastore-lib calls in the IPackage and IResource interfaces. This is less clean and will require a bunch of extra code and refactor that I will work on it tomorrow.