Closed ghukill closed 7 years ago
Associated problem - default locations for objects and collections. The LDP implementation of PCDM document suggests there should be BasicContainers for objects and collections, e.g. /objects
and /collections
. It goes on to say these default locations can exist at any level of hiearchy, and might vary for different collections or object types.
This presents a problem with "short" vs. "long" URIs provided to pyfc4. This demonstrates that:
super().__init__(repo, uri="%s/%s" % (collections_path, uri), response=response)
When initializing a collection with a "short" URI like trees
, this approach works:
In [6]: trees = pcdm.models.PCDMCollection(repo,'trees')
In [7]: trees.uri
Out[7]: rdflib.term.URIRef('http://localhost:8080/rest/collections/trees')
However, providing a full URI obviously breaks this pattern:
In [8]: trees = pcdm.models.PCDMCollection(repo,'http://localhost:8080/rest/collections/trees')
In [9]: trees.uri
Out[9]: rdflib.term.URIRef('http://localhost:8080/rest/collections/http://localhost:8080/rest/collections/trees')
The latter case also happens when retrieving a PCDMCollection:
In [13]: trees = repo.get_resource('collections/trees',resource_type=pcdm.models.PCDMCollection)
DEBUG:pyfc4.models:HEAD request for http://localhost:8080/rest/collections/trees, format None, headers None
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "HEAD /rest/collections/trees HTTP/1.1" 200 0
DEBUG:pyfc4.models:using resource type: <class 'pyfc4.plugins.pcdm.models.PCDMCollection'>
DEBUG:pyfc4.models:GET request for http://localhost:8080/rest/collections/trees/fcr:metadata, format text/turtle, headers {'Accept': 'text/turtle'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /rest/collections/trees/fcr:metadata HTTP/1.1" 200 1801
In [14]: trees.uri
Out[14]: rdflib.term.URIRef('http://localhost:8080/rest/collections/http://localhost:8080/rest/collections/trees')
Even if the pcdm plugin were to require "short" URIs for everything, if providing one at all, the problem of retrieval would still occur.
One option, not that it's a particularly good one, would be using the known repository root + known collections root to derive the PCDM collection/object's "short" URI. Something like:
# if full URI provided, as is the case with retrieval, derive "short" URI
if repo.root in uri:
uri = uri.split(collections_path)[-1].lstrip('/')
This is probably intrinsically a bad idea, as it would not support different forms of the repository root, e.g. localhost
vs 127.0.0.1
, or in-place reverse proxies. This kind of string comparison feels risky at best.
This example demonstrates that, if repo.root
== 127.0.0.1
:
In [3]: bees = pcdm.models.PCDMCollection(repo,'http://localhost:8080/rest/collections/bees')
In [4]: bees.uri
Out[4]: rdflib.term.URIRef('http://127.0.0.1:8080/rest/collections/http://localhost:8080/rest/collections/bees')
This also falls apart almost immediately with nested collections or objects, where splitting on collections_path
or objects_path
isn't sufficient.
It should be mentioned: this problem is almost completely sidestepped by not allowing specified URIs. If all collection/object creation is via POST
with repository minted URIs, this becomes a non-issue. Though this is also considered best practice in many ways, it would be a shame to lose the ability to create semantically meaningful URIs if desired.
Closing this as well - see https://github.com/ghukill/pyfc4/issues/78#issuecomment-325822009.
Where to put configurations for PCDM plugin?
Use cases include:
/objects/
and/collections/
Could be a settings file in the module directory? passed as parameters, with defaults, to all PCDM models? The former requires modifying library code, which is not fun when installed. The latter might get tiresome.
Perhaps the entire plugin needs to be instantiated? Something like...