Open obulat opened 3 years ago
@WordPress/openverse-catalog Because this issue suggests parsing the RDF license spec, I wonder if it should be part of the catalogue data rather than managed and built by the API. Similar as my question in this respect for WordPress/openverse#753.
This issue has been migrated from the CC Search api repository
The endpoint should list every possible valid Creative Commons license. Each license in the response should have:
While we model licenses in a simple way in
licenses.py
, this is not sufficient for accurately listing all possible licenses. There are ~700 versions of licenses. The best way to get this information is to parse the license RDF spec here. It should be parsed when the server is started up and kept in memory. I don't expect this spec to change very often, so instead of fetching it remotely, we should just keep a copy of it in this git repository.Original Comments:
Issue author ritesh-pandey commented on Sat Oct 12 2019:
@aldenstpage Did I understand the problem statement correctly? source
Issue author HAKSOAT commented on Wed Jan 29 2020:
kgodey commented on Thu Jan 30 2020:
Issue author HAKSOAT commented on Thu Jan 30 2020:
kgodey commented on Sat Feb 01 2020:
Issue author HAKSOAT commented on Sat Feb 01 2020:
Gbahdeyboh commented on Sun Feb 02 2020:
I hope this helps @HAKSOAT
source
Gbahdeyboh commented on Sun Feb 02 2020:
Issue author HAKSOAT commented on Tue Feb 04 2020:
kgodey commented on Fri Feb 28 2020:
Issue author DantrazTrev commented on Sat Feb 29 2020:
source
Gbahdeyboh commented on Sat Feb 29 2020:
aldenstpage commented on Tue Apr 14 2020:
Tanuj22 commented on Thu Apr 23 2020:
Issue author ritesh-pandey commented on Mon Apr 27 2020:
As per documentation, Django will call
ready
only if we specify dotted path toApiConfig
inINSTALLED_APPS
.Unfortunately I am stuck at
Something is missing. My guess is that we need to specify
path
attribute ofApiConfig
as well. sourceTanuj22 commented on Tue Apr 28 2020:
I can't figure out why it is filled with
\n
.If I simply try to do this
I get output like this
I can't figure out how to get the required information from this.
source
Tanuj22 commented on Tue Apr 28 2020:
Issue author ritesh-pandey commented on Wed Apr 29 2020:
@Tanuj22 This should help in understanding the RDF structure. source
Issue author ritesh-pandey commented on Thu Apr 30 2020:
settings.py
ready
ofapi
app to run one-time process of parsing and caching license versionsParsing the license RDF file in one go is causing problem. I think we are hitting some limits, probably memory limit. I created some testing RDF files from the original one. I am unable to parse beyond 24 licenses (2600 lines in RDF file). source
aldenstpage commented on Thu Apr 30 2020:
The file is 5mb, it shouldn't suck up all of your system's memory. It's possible there is a cycle or a problem with RDFlib; perhaps you can profile your code to get a better idea of the source of the problem source
Issue author ritesh-pandey commented on Fri May 01 2020:
But, I am still struggling with the original problem. The script gets terminated at random when executed with original license file with an error related to
rdflib
. When I run the same script with smaller test file (subset of original file), it executes perfectly.I created a gist for the same.
virtualenv
withPython3
For Linux usersvirtualenv env --python /usr/bin/python3
source env/bin/activate
requirements.pip
file with.Install required modules.
pip install -r requirements.pip
licenses.py
file.CC = Namespace('http://creativecommons.org/ns#') settings = { 'LICENSE_RDF_PATH': os.path.realpath(sys.argv[1]) }
@profile def parse_and_cache_licenses(): licenses = [] license_graph = Graph() license_graph.load(settings['LICENSE_RDF_PATH']) cc_license_resource = Resource(license_graph, CC.License) for cc_license in cc_license_resource.subjects(): license_url = cc_license.identifier version = cc_license.value(p=DCTERMS.hasVersion) jurisdiction = (cc_license.value(p=CC.jurisdiction)).identifier for cc_license_predicate, cc_license_object in cc_license.predicate_objects(): if cc_license_predicate.qname() == 'dc:title': language = cc_license_object.language licenses.append({ 'license_url': license_url, 'license_version': version, 'jurisdiction': jurisdiction, 'language_code': language })
parse_and_cache_licenses()