alphagov / datagovuk_find

Beta version of Find Data
12 stars 9 forks source link

Incorrect licence type sometimes shown in metadata box on the dataset page #1327

Open deborahchua opened 1 week ago

deborahchua commented 1 week ago

We have found some datasets where the licence type shown should be mapped one of the default options in CKAN, and if not, shows Other or None.

The default list is defined here - https://github.com/ckan/ckan/blob/12c5e9adb77ace52b7d539c91f910b3c9ad510d6/ckan/model/license.py#L75-L90

It looks like the licence type is pulled in as additional data, and can happen to both harvested and non-harvested data.

For example: Screenshot 2024-10-21 at 11 31 31

Other examples:

Screenshot 2024-10-21 at 11 30 33

deborahchua commented 2 days ago

I ran a facet query in Solr using license_id as a facet field and returns:

"facet_fields":{
      "license_id":[
        "uk-ogl",25775,
        "cc-by",975,
        "OGL-UK-3.0",402,
        "notspecified",119,
        "other-nc",49,
        "cc-zero",17,
        "other-closed",17,
        "other-pd",13,
        "unpublished",11,
        "cc-by-sa",6,
        "other-open",6,
        "odc-odbl",5,
        "odc-by",4,
        "odc-pddl",3,
        "__other__",2,
        "cc-nc",1,
        "ogl",1,
        "other-at",1]}

Solr query used - /solr/ckan/select?fq=license_id%3A"__other__"&indent=true&q.op=OR&q=*%3A*&rows=100

uk-ogl is the code we use to filter datasets that use the standard OGL licence (currently version 3). Datasets with OGL-UK-3.0 should also be mapped to the same code. Example query to find individual datasets using a specific licence ID - /solr/ckan/select?fq=license_id%3A"OGL-UK-3.0"&indent=true&q.op=OR&q=*%3A*&rows=100