datopian / ckanext-versioning

Deprecated. See https://github.com/datopian/ckanext-versions. ⏰ CKAN extension providing data versioning (metadata and files) based on git and github.
https://tech.datopian.com/versioning/
GNU Affero General Public License v3.0
7 stars 4 forks source link

CKAN v2 <=> MetaStore Lib approach #32

Closed rufuspollock closed 4 years ago

rufuspollock commented 4 years ago

I'm opening this issue as a central place to discuss and finalize the "agreement" and algorithm for ckan <=> metastore (lib). It will help address some of the confusions e.g. in #21.

Desired outcomes and core principles:

What is CKAN's package metadata?

A little confusingly CKAN has (at least) two methods for converting a Dataset/Package to a dictionary: package_show and as_dict

The difficulty is that extensions plug into package_show and package_show clear has information that is not really part of the package and should not be revisioned.

Questions

Approach

The pure way to go would be for ckanext-versioning to:

Mapping package_show as a whole is not recommended as we end up with lots of noise in extras. If you do want to map raw package_show we can prob do some more work to remove the noise in extras.

graph LR

ckan --basic conversion i.e. as_dict + set path--> metastore
metastore --do core package show for head then overwrite with metastore info--> ckan

Here's some of the code we are writing to update CKAN's package dictionary with metastore-lib metadata: https://github.com/datopian/ckanext-versioning/blob/58cada2193c5190fed1d00c7382810b07ae8e756/ckanext/versioning/datapackage.py#L114

Being forgiving: if a user does store package_show let's convert it back for them

What if someone does just want to dump package_show? Can we make sure we intelligently convert it back for them? I think the answer is yes ...

Tag handling

issue b/c what happens if tags don't exist in DB anymore (but did exist in old revision) we will then show tags with links on showcase @ revision X that "don't work"

package['tags'] = [ expand_tag(tag_name) for tag_name in package['tags'] if expand_tag(tag_name) is not None ]
def expand_tag(tagname):
    try:
      model.Tag.get_by_name(tagname)
      tag = as_dict()
    except:
      return None

def package_show(dataset, revision):
    rawdict = metastorelib(dataset, revision)
    ckandict = f2c.dataset(rawdict)
    # some additional tidying up to match the way pkg_show goes e.g. tags are not a list of names but full tag object
    ckandict_pkg_show = pkg_showize(ckandict)
    ckandictfromckan = core_package_show(dataset)
    # no need for smart merging ... just do raw merge
    # smart merge would recurse down to check if there are things to merge in subdicts etc - NOT NEEDED
    ckandictfromckan.update(ckandict_pkg_show)
)

2) Side Effects core package_show and other stuff

package_show has some side effects like setting context['package'] = pkg that we will need to address and make me think that it's gonna be difficult to avoid. Another thing is that package_create calls package_show at the end, so the way I was hooking the logic is not gonna be possible since it will try to look at a github repository before we even create one since I need the pkg_dict to do a proper conversion between CKAN <-> DataPackage (the data_dict argument of package_create is not enough)

3) Using Interfaces

Point 2 will require to make a logic like we did in ckanext-gitdatahub and hook all the metastore-lib calls in the IPackage and IResource interfaces. This is less clean and will require a bunch of extra code and refactor that I will work on it tomorrow.

pdelboca commented 4 years ago

I will add some thoughts.

The current implementation of the mapper doesn't return a tag key if there are not tags. This will cause the Core CKAN dictionary to not be updated correctly and always show the current tags if previous revisions don't have tags. This can be fixed quickly however it raises some other concerns.

Evolution of Metadata Schemas

Moved to #35 as a bigger, broader question.

rufuspollock commented 4 years ago

FIXED. I believe we have now fixed this with final work in frictionless-ckan-mapper.