Create a /license endpoint (original #368)

This issue has been migrated from the CC Search api repository

Author: aldenstpage
Date: Thu Oct 10 2019
Labels: Hacktoberfest,help wanted,✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued

The endpoint should list every possible valid Creative Commons license. Each license in the response should have:

The license version number
The license URL
The jurisdiction
The language code

GET /v1/licenses

[
    {
        "license_version": 4.0,
        "license_url": "https://creativecommons.org/licenses/by/4.0/",
        "jurisdiction": "us", 
        "language_code": "en"
    },
    {
    . . .
    }
    . . .
]

While we model licenses in a simple way in licenses.py, this is not sufficient for accurately listing all possible licenses. There are ~700 versions of licenses. The best way to get this information is to parse the license RDF spec here. It should be parsed when the server is started up and kept in memory. I don't expect this spec to change very often, so instead of fetching it remotely, we should just keep a copy of it in this git repository.

Original Comments:

Issue author ritesh-pandey commented on Sat Oct 12 2019:

images table contains various combination of license and license_version. We need to list them by /license endpoint.

@aldenstpage Did I understand the problem statement correctly? source

Issue author HAKSOAT commented on Wed Jan 29 2020:

Is this task still open? I see some work has been done. source

kgodey commented on Thu Jan 30 2020:

@HAKSOAT Yes, the previous pull request was never completed and this is still open. cc @aldenstpage source

Issue author HAKSOAT commented on Thu Jan 30 2020:

Okay. I can pick it up. Any idea what caused the other PR to stall? source

kgodey commented on Sat Feb 01 2020:

@HAKSOAT great, I've marked as in progress. You can see the other PR here: https://github.com/creativecommons/cccatalog-api/pull/374 source

Issue author HAKSOAT commented on Sat Feb 01 2020:

Hello @kgodey I checked the PR, but it looks to me like it did the job as required. I'm wondering why it wasn't merged. source

Gbahdeyboh commented on Sun Feb 02 2020:

I think there is an important part of this issue is not being paid attention to... The issue states that... "The endpoint should list all valid licenses and versions in the catalog"

A valid license is a license that has been properly attributed and does not have a legal term that legally prevents others from doing what the license permits.

I hope this helps @HAKSOAT

source

Gbahdeyboh commented on Sun Feb 02 2020:

https://creativecommons.org/use-remix/attribution/ source

Issue author HAKSOAT commented on Tue Feb 04 2020:

Thanks @Gbahdeyboh . @kgodey I think I understand it better now and can work on it, or is there any other clarification I may need? source

kgodey commented on Fri Feb 28 2020:

@HAKSOAT are you still working on this? source

Issue author DantrazTrev commented on Sat Feb 29 2020:

Hey is this issue still open?

source

Gbahdeyboh commented on Sat Feb 29 2020:

It currently isn't open to be worked on. It has a label not ready for work. source

aldenstpage commented on Tue Apr 14 2020:

After talking with the stakeholder who requested this issue, I found that my initial suggestion to use our internal model for licenses in licenses.py wouldn't meet their requirements. We're going to have to take a different approach (description updated). I've overhauled this to be less ambiguous so an outside contributor can take a crack at it. source

Tanuj22 commented on Thu Apr 23 2020:

@aldenstpage for It should be parsed when the server is started up part is this the best approach https://stackoverflow.com/a/6792076 or is there a better way? source

Issue author ritesh-pandey commented on Mon Apr 27 2020:

Adding to @Tanuj22 's idea, ready() looks to be good place for parsing and loading licenses.

As per documentation, Django will call ready only if we specify dotted path to ApiConfig in INSTALLED_APPS.

Unfortunately I am stuck at

django.core.exceptions.ImproperlyConfigured: Cannot import 'api'. Check that 'cccatalog.api.apps.ApiConfig.name' is correct.

Something is missing. My guess is that we need to specify path attribute of ApiConfig as well. source

Tanuj22 commented on Tue Apr 28 2020:

@aldenstpage I am having some trouble parsing the file. If I use this approrch I get this output.

"{\n    \"@context\": {\n        \"cc\": \"http://creativecommons.org/ns#\",\n        \"dc\": \"http://purl.org/dc/elements/1.1/\",\n        \"dcq\": \"http://purl.org/dc/terms/\",\n        \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n        \"rdf\": \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\",\n        \"rdfs\": \"http://www.w3.org/2000/01/rdf-schema#\",\n        \"xsd\": \"http://www.w3.org/2001/XMLSchema#\"\n    },\n    \"@graph\": [\n        {\n            \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/uk/\",\n            \"@type\": \"cc:License\",\n            \"cc:jurisdiction\": {\n                \"@id\": \"http://creativecommons.org/international/uk/\"\n            },\n            \"cc:legalcode\": {\n                \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/uk/legalcode\"\n            },\n            \"cc:licenseClass\": {\n                \"@id\": \"http://creativecommons.org/license/\"\n            },\n            \"cc:permits\": [\n                {\n                    \"@id\": \"cc:Distribution\"\n                },\n                {\n                    \"@id\": \"cc:Reproduction\"\n                }\n            ],\n            \"cc:prohibits\": {\n                \"@id\": \"cc:CommercialUse\"\n            },\n            \"cc:requires\": [\n                {\n                    \"@id\": \"cc:Notice\"\n                },\n                {\n                    \"@id\": \"cc:Attribution\"\n                }\n            ],\n            \"dc:creator\": {\n                \"@id\": \"http://creativecommons.org/\"\n            },\n            \"dc:identifier\": \"by-nc-nd\",\n            \"dc:source\": {\n                \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/\"\n            },\n            \"dc:title\": [\n                {\n                    \"@language\": \"oci-es\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"fr-ca\",\n                    \"@value\": \"Paternité - Pas d'Utilisation Commerciale - Pas de Modification 2.0 Royaume-Uni\"\n                },\n                {\n                    \"@language\": \"ka\",\n                    \"@value\": \"ავტორობის მითითებით–არაკომერციული გამოყენბისათვის–გადამუშავების გარეშე 2.0 გაერთიენაბული სამეფო: ინგლისი და უალესი\"\n                },\n                {\n                    \"@language\": \"en\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"es-es\",\n                    \"@value\": \"Reconocimiento-NoComercial-SinObraDerivada 2.0 Inglaterra y País de Gales\"\n                },\n                {\n                    \"@language\": \"lt\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 Jungtinė Karalystė: Anglija ir Velsas\"\n                },\n                {\n                    \"@language\": \"pl\",\n                    \"@value\": \"Uznanie autorstwa-Użycie niekomercyjne-Bez utworów zależnych 2.0 UK: Anglia i Walia\"\n                },\n                {\n                    \"@language\": \"kk\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"de-ch\",\n                    \"@value\": \"Namensnennung-NichtKommerziell-KeineBearbeitung 2.0 England & Wales\"\n                },\n                {\n                    \"@language\": \"en-sg\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"gl\",\n                    \"@value\": \"Recoñecemento-NonComercial-SenObraDerivada 2.0 Inglaterra e País de Gales\"\n

I can't figure out why it is filled with \n.

If I simply try to do this

import rdflib
g = rdflib.Graph()
g.load('index.rdf')

for s, p, o in g:
    print(s, p, o)

I get output like this

http://creativecommons.org/licenses/by/2.0/it/ http://purl.org/dc/elements/1.1/title Reconocimiento 2.0 Italia
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.5/br/ http://purl.org/dc/elements/1.1/title Navngivelse-IkkeKommerciel-IngenBearbejdelse 2.5 Brasilien
web_1               | http://creativecommons.org/licenses/by-sa/2.5/ch/ http://purl.org/dc/elements/1.1/title Imenovanje-Dijeli pod istim uvjetima 2.5 Švicarska
web_1               | http://creativecommons.org/licenses/by-sa/2.5/au/ http://purl.org/dc/elements/1.1/title Namensnennung-Weitergabe unter gleichen Bedingungen 2.5 Australien
web_1               | http://creativecommons.org/licenses/by-sa/3.0/ug/ http://purl.org/dc/elements/1.1/title Attribution-ShareAlike 3.0 Uganda
web_1               | http://creativecommons.org/licenses/by-nc/2.5/it/ http://purl.org/dc/elements/1.1/title Հղում - Ոչ-առևտրային օգտագործում 2.5 Իտալիա
web_1               | http://creativecommons.org/licenses/by-nc-sa/2.5/br/ http://purl.org/dc/elements/1.1/title Attribution-NonCommercial-ShareAlike 2.5 Brazil
web_1               | http://creativecommons.org/licenses/by-nc/2.1/jp/ http://purl.org/dc/elements/1.1/title НаведиИзвор-Некомерцијално 2.1 Јапонија
web_1               | http://creativecommons.org/licenses/by-nc-nd/3.0/gt/ http://purl.org/dc/elements/1.1/title نسب- غير تجاري - لا اشتقاق 3.0 كاثولونيا
web_1               | http://creativecommons.org/licenses/by-sa/3.0/gt/ http://purl.org/dc/elements/1.1/title Erkenning-InsgelyksDeel 3.0 Guatemala
web_1               | http://creativecommons.org/licenses/publicdomain/ http://creativecommons.org/ns#permits http://creativecommons.org/ns#Reproduction
web_1               | http://creativecommons.org/licenses/by-nc-sa/3.0/au/ http://purl.org/dc/elements/1.1/title Recoñecemento-NonComercial-CompartirIgual 3.0 Australia
web_1               | http://creativecommons.org/licenses/by-nc/3.0/at/ http://purl.org/dc/elements/1.1/title Namensnennung-NichtKommerziell 3.0 Österreich
web_1               | http://creativecommons.org/licenses/by-sa/1.0/nl/ http://purl.org/dc/elements/1.1/title Attribution-ShareAlike 1.0 Netherlands
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.0/uk/ http://creativecommons.org/ns#requires http://creativecommons.org/ns#Attribution
web_1               | http://creativecommons.org/licenses/nc/1.0/ http://purl.org/dc/elements/1.1/title Pas d'Utilisation Commerciale 1.0 Générique
web_1               | http://creativecommons.org/licenses/by/2.5/se/ http://xmlns.com/foaf/0.1/logo https://i.creativecommons.org/l/by/2.5/se/88x31.png
web_1               | http://creativecommons.org/licenses/by-nc-sa/2.0/au/ http://purl.org/dc/elements/1.1/title Nevezd meg! - Ne add el! - Így add tovább! 2.0 Ausztrália
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.0/tw/ http://purl.org/dc/elements/1.1/title Nimeä-Ei muutoksia-Epäkaupallinen 2.0 Taiwan
web_1               | http://creativecommons.org/licenses/by-nd/2.0/kr/ http://creativecommons.org/ns#permits http://creativecommons.org/ns#Reproduction

I can't figure out how to get the required information from this.

source

Tanuj22 commented on Tue Apr 28 2020:

@ritesh-pandey Just change name = api to name = cccatalog.api in apps.py source

Issue author ritesh-pandey commented on Wed Apr 29 2020:

@aldenstpage I am having some trouble parsing the file. If I use this approrch I get this output.

"{\n    \"@context\": {\n        \"cc\": \"http://creativecommons.org/ns#\",\n        \"dc\": \"http://purl.org/dc/elements/1.1/\",\n        \"dcq\": \"http://purl.org/dc/terms/\",\n        \"foaf\": \"http://xmlns.com/foaf/0.1/\",\n        \"rdf\": \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\",\n        \"rdfs\": \"http://www.w3.org/2000/01/rdf-schema#\",\n        \"xsd\": \"http://www.w3.org/2001/XMLSchema#\"\n    },\n    \"@graph\": [\n        {\n            \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/uk/\",\n            \"@type\": \"cc:License\",\n            \"cc:jurisdiction\": {\n                \"@id\": \"http://creativecommons.org/international/uk/\"\n            },\n            \"cc:legalcode\": {\n                \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/uk/legalcode\"\n            },\n            \"cc:licenseClass\": {\n                \"@id\": \"http://creativecommons.org/license/\"\n            },\n            \"cc:permits\": [\n                {\n                    \"@id\": \"cc:Distribution\"\n                },\n                {\n                    \"@id\": \"cc:Reproduction\"\n                }\n            ],\n            \"cc:prohibits\": {\n                \"@id\": \"cc:CommercialUse\"\n            },\n            \"cc:requires\": [\n                {\n                    \"@id\": \"cc:Notice\"\n                },\n                {\n                    \"@id\": \"cc:Attribution\"\n                }\n            ],\n            \"dc:creator\": {\n                \"@id\": \"http://creativecommons.org/\"\n            },\n            \"dc:identifier\": \"by-nc-nd\",\n            \"dc:source\": {\n                \"@id\": \"http://creativecommons.org/licenses/by-nc-nd/2.0/\"\n            },\n            \"dc:title\": [\n                {\n                    \"@language\": \"oci-es\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"fr-ca\",\n                    \"@value\": \"Paternité - Pas d'Utilisation Commerciale - Pas de Modification 2.0 Royaume-Uni\"\n                },\n                {\n                    \"@language\": \"ka\",\n                    \"@value\": \"ავტორობის მითითებით–არაკომერციული გამოყენბისათვის–გადამუშავების გარეშე 2.0 გაერთიენაბული სამეფო: ინგლისი და უალესი\"\n                },\n                {\n                    \"@language\": \"en\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"es-es\",\n                    \"@value\": \"Reconocimiento-NoComercial-SinObraDerivada 2.0 Inglaterra y País de Gales\"\n                },\n                {\n                    \"@language\": \"lt\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 Jungtinė Karalystė: Anglija ir Velsas\"\n                },\n                {\n                    \"@language\": \"pl\",\n                    \"@value\": \"Uznanie autorstwa-Użycie niekomercyjne-Bez utworów zależnych 2.0 UK: Anglia i Walia\"\n                },\n                {\n                    \"@language\": \"kk\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"de-ch\",\n                    \"@value\": \"Namensnennung-NichtKommerziell-KeineBearbeitung 2.0 England & Wales\"\n                },\n                {\n                    \"@language\": \"en-sg\",\n                    \"@value\": \"Attribution-NonCommercial-NoDerivs 2.0 UK: England & Wales\"\n                },\n                {\n                    \"@language\": \"gl\",\n                    \"@value\": \"Recoñecemento-NonComercial-SenObraDerivada 2.0 Inglaterra e País de Gales\"\n

I can't figure out why it is filled with \n.

If I simply try to do this

import rdflib
g = rdflib.Graph()
g.load('index.rdf')

for s, p, o in g:
    print(s, p, o)

I get output like this

http://creativecommons.org/licenses/by/2.0/it/ http://purl.org/dc/elements/1.1/title Reconocimiento 2.0 Italia
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.5/br/ http://purl.org/dc/elements/1.1/title Navngivelse-IkkeKommerciel-IngenBearbejdelse 2.5 Brasilien
web_1               | http://creativecommons.org/licenses/by-sa/2.5/ch/ http://purl.org/dc/elements/1.1/title Imenovanje-Dijeli pod istim uvjetima 2.5 Švicarska
web_1               | http://creativecommons.org/licenses/by-sa/2.5/au/ http://purl.org/dc/elements/1.1/title Namensnennung-Weitergabe unter gleichen Bedingungen 2.5 Australien
web_1               | http://creativecommons.org/licenses/by-sa/3.0/ug/ http://purl.org/dc/elements/1.1/title Attribution-ShareAlike 3.0 Uganda
web_1               | http://creativecommons.org/licenses/by-nc/2.5/it/ http://purl.org/dc/elements/1.1/title Հղում - Ոչ-առևտրային օգտագործում 2.5 Իտալիա
web_1               | http://creativecommons.org/licenses/by-nc-sa/2.5/br/ http://purl.org/dc/elements/1.1/title Attribution-NonCommercial-ShareAlike 2.5 Brazil
web_1               | http://creativecommons.org/licenses/by-nc/2.1/jp/ http://purl.org/dc/elements/1.1/title НаведиИзвор-Некомерцијално 2.1 Јапонија
web_1               | http://creativecommons.org/licenses/by-nc-nd/3.0/gt/ http://purl.org/dc/elements/1.1/title نسب- غير تجاري - لا اشتقاق 3.0 كاثولونيا
web_1               | http://creativecommons.org/licenses/by-sa/3.0/gt/ http://purl.org/dc/elements/1.1/title Erkenning-InsgelyksDeel 3.0 Guatemala
web_1               | http://creativecommons.org/licenses/publicdomain/ http://creativecommons.org/ns#permits http://creativecommons.org/ns#Reproduction
web_1               | http://creativecommons.org/licenses/by-nc-sa/3.0/au/ http://purl.org/dc/elements/1.1/title Recoñecemento-NonComercial-CompartirIgual 3.0 Australia
web_1               | http://creativecommons.org/licenses/by-nc/3.0/at/ http://purl.org/dc/elements/1.1/title Namensnennung-NichtKommerziell 3.0 Österreich
web_1               | http://creativecommons.org/licenses/by-sa/1.0/nl/ http://purl.org/dc/elements/1.1/title Attribution-ShareAlike 1.0 Netherlands
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.0/uk/ http://creativecommons.org/ns#requires http://creativecommons.org/ns#Attribution
web_1               | http://creativecommons.org/licenses/nc/1.0/ http://purl.org/dc/elements/1.1/title Pas d'Utilisation Commerciale 1.0 Générique
web_1               | http://creativecommons.org/licenses/by/2.5/se/ http://xmlns.com/foaf/0.1/logo https://i.creativecommons.org/l/by/2.5/se/88x31.png
web_1               | http://creativecommons.org/licenses/by-nc-sa/2.0/au/ http://purl.org/dc/elements/1.1/title Nevezd meg! - Ne add el! - Így add tovább! 2.0 Ausztrália
web_1               | http://creativecommons.org/licenses/by-nc-nd/2.0/tw/ http://purl.org/dc/elements/1.1/title Nimeä-Ei muutoksia-Epäkaupallinen 2.0 Taiwan
web_1               | http://creativecommons.org/licenses/by-nd/2.0/kr/ http://creativecommons.org/ns#permits http://creativecommons.org/ns#Reproduction

I can't figure out how to get the required information from this.

@Tanuj22 This should help in understanding the RDF structure. source

Issue author ritesh-pandey commented on Thu Apr 30 2020:

https://github.com/ritesh-pandey/cccatalog-api/blob/add-license-end-point/cccatalog-api/cccatalog/api/licenses.py#L45

Read RDF file location from settings.py
Use ready of api app to run one-time process of parsing and caching license versions
Use default caching to store all licenses in memory (@aldenstpage please comment if this is the recommended way)

Parsing the license RDF file in one go is causing problem. I think we are hitting some limits, probably memory limit. I created some testing RDF files from the original one. I am unable to parse beyond 24 licenses (2600 lines in RDF file). source

aldenstpage commented on Thu Apr 30 2020:

You can cache it using the Django cache interface, but you'll need to set up a local cache backend (see settings.py and Django docs) instead of Redis. Redis is used to share cached state data between servers; in this case, every server is going to have its own copy of the file during installation, so there's no need to store it in Redis.

The file is 5mb, it shouldn't suck up all of your system's memory. It's possible there is a cycle or a problem with RDFlib; perhaps you can profile your code to get a better idea of the source of the problem source

Issue author ritesh-pandey commented on Fri May 01 2020:

Added Local Memory cache

But, I am still struggling with the original problem. The script gets terminated at random when executed with original license file with an error related to rdflib. When I run the same script with smaller test file (subset of original file), it executes perfectly.

I created a gist for the same.

Download/clone the gist
Create a virtualenv with Python3 For Linux users virtualenv env --python /usr/bin/python3
Activate virtual environment. source env/bin/activate

Create requirements.pip file with.

isodate==0.6.0
memory-profiler==0.57.0
pkg-resources==0.0.0
psutil==5.7.0
pyparsing==2.4.7
rdflib==5.0.0
six==1.14.0

Install required modules. pip install -r requirements.pip

Create a licenses.py file.


import os
import sys
from rdflib import Graph, Namespace
from rdflib.namespace import DC, DCTERMS
from rdflib.resource import Resource

CC = Namespace('http://creativecommons.org/ns#') settings = { 'LICENSE_RDF_PATH': os.path.realpath(sys.argv[1]) }

@profile def parse_and_cache_licenses(): licenses = [] license_graph = Graph() license_graph.load(settings['LICENSE_RDF_PATH']) cc_license_resource = Resource(license_graph, CC.License) for cc_license in cc_license_resource.subjects(): license_url = cc_license.identifier version = cc_license.value(p=DCTERMS.hasVersion) jurisdiction = (cc_license.value(p=CC.jurisdiction)).identifier for cc_license_predicate, cc_license_object in cc_license.predicate_objects(): if cc_license_predicate.qname() == 'dc:title': language = cc_license_object.language licenses.append({ 'license_url': license_url, 'license_version': version, 'jurisdiction': jurisdiction, 'language_code': language })

parse_and_cache_licenses()



Download and place the license rdf file at same folder level as that of `licenses.py`
Run the program with `memory-profiler` module. `licenses.py` takes first argument as path of rdf file.
`python -m memory_profiler licenses.py licenses.rdf `

Create a subset of original license rdf file with ~ 2600 lines. Let us call it `license.test.rdf`.
Let us run our program with this file.
`python -m memory_profiler licenses.py licenses.test.rdf `
[source](https://github.com/creativecommons/cccatalog-api/issues/368#issuecomment-622347930)

*Tanuj22* commented on Thu May 07 2020:

>@ritesh-pandey I followed a similar approach as you and was able to parse the file. But this logic is taking up a great effort from my system and is barely able to do so. I guess there has to be a better way to parse the file.
[source](https://github.com/creativecommons/cccatalog-api/issues/368#issuecomment-625246089)

Issue author *ritesh-pandey* commented on Sun May 10 2020:

>Agree. I think we are taking wrong approach here. @aldenstpage Can we use some help here?
[source](https://github.com/creativecommons/cccatalog-api/issues/368#issuecomment-626352570)

*aldenstpage* commented on Thu May 14 2020:

>Hey @ritesh-pandey, I'll have some time to take a closer look at this issue in our next sprint (the next two weeks beginning Monday). My development schedule is a bit overstuffed at the moment
[source](https://github.com/creativecommons/cccatalog-api/issues/368#issuecomment-628876233)

Issue author *tushar912* commented on Fri Sep 25 2020:

>@aldenstpage I was working on the implementation but i find that the RDF file mentioned by you for some licences does not contain language code .So should the language code be left empty for those.
[source](https://github.com/creativecommons/cccatalog-api/issues/368#issuecomment-699063684)

WordPress / openverse

Create a /license endpoint (original #368) #749

Original Comments: