RockefellerArchiveCenter / pyfc4

Python client for Fedora Commons 4
MIT License
7 stars 0 forks source link

pcdm plugin - configurations #76

Closed ghukill closed 7 years ago

ghukill commented 7 years ago

Where to put configurations for PCDM plugin?

Use cases include:

Could be a settings file in the module directory? passed as parameters, with defaults, to all PCDM models? The former requires modifying library code, which is not fun when installed. The latter might get tiresome.

Perhaps the entire plugin needs to be instantiated? Something like...

# import pcdm plugin
from pyfc4.plugins import pcdm

# instantiate plugin
pcdm_handle = pcdm.init(
    objects_path='/objects',
    collections_path='/collections'
)

# then use...
poe = pcdm_handle.PCDMCollection('poe')
poe.create(specify_uri=True)
# creates PCDMCollection at /collections/poe, based on init settings
ghukill commented 7 years ago

Associated problem - default locations for objects and collections. The LDP implementation of PCDM document suggests there should be BasicContainers for objects and collections, e.g. /objects and /collections. It goes on to say these default locations can exist at any level of hiearchy, and might vary for different collections or object types.

This presents a problem with "short" vs. "long" URIs provided to pyfc4. This demonstrates that:

super().__init__(repo, uri="%s/%s" % (collections_path, uri), response=response)

When initializing a collection with a "short" URI like trees, this approach works:

In [6]: trees = pcdm.models.PCDMCollection(repo,'trees')

In [7]: trees.uri
Out[7]: rdflib.term.URIRef('http://localhost:8080/rest/collections/trees')

However, providing a full URI obviously breaks this pattern:

In [8]: trees = pcdm.models.PCDMCollection(repo,'http://localhost:8080/rest/collections/trees')

In [9]: trees.uri
Out[9]: rdflib.term.URIRef('http://localhost:8080/rest/collections/http://localhost:8080/rest/collections/trees')

The latter case also happens when retrieving a PCDMCollection:

In [13]: trees = repo.get_resource('collections/trees',resource_type=pcdm.models.PCDMCollection)
DEBUG:pyfc4.models:HEAD request for http://localhost:8080/rest/collections/trees, format None, headers None
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "HEAD /rest/collections/trees HTTP/1.1" 200 0
DEBUG:pyfc4.models:using resource type: <class 'pyfc4.plugins.pcdm.models.PCDMCollection'>
DEBUG:pyfc4.models:GET request for http://localhost:8080/rest/collections/trees/fcr:metadata, format text/turtle, headers {'Accept': 'text/turtle'}
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:8080 "GET /rest/collections/trees/fcr:metadata HTTP/1.1" 200 1801

In [14]: trees.uri
Out[14]: rdflib.term.URIRef('http://localhost:8080/rest/collections/http://localhost:8080/rest/collections/trees')

Even if the pcdm plugin were to require "short" URIs for everything, if providing one at all, the problem of retrieval would still occur.

One option, not that it's a particularly good one, would be using the known repository root + known collections root to derive the PCDM collection/object's "short" URI. Something like:

# if full URI provided, as is the case with retrieval, derive "short" URI
    if repo.root in uri:
        uri = uri.split(collections_path)[-1].lstrip('/')
ghukill commented 7 years ago

This is probably intrinsically a bad idea, as it would not support different forms of the repository root, e.g. localhost vs 127.0.0.1, or in-place reverse proxies. This kind of string comparison feels risky at best.

This example demonstrates that, if repo.root == 127.0.0.1:

In [3]: bees = pcdm.models.PCDMCollection(repo,'http://localhost:8080/rest/collections/bees')

In [4]: bees.uri
Out[4]: rdflib.term.URIRef('http://127.0.0.1:8080/rest/collections/http://localhost:8080/rest/collections/bees')

This also falls apart almost immediately with nested collections or objects, where splitting on collections_path or objects_path isn't sufficient.

ghukill commented 7 years ago

It should be mentioned: this problem is almost completely sidestepped by not allowing specified URIs. If all collection/object creation is via POST with repository minted URIs, this becomes a non-issue. Though this is also considered best practice in many ways, it would be a shame to lose the ability to create semantically meaningful URIs if desired.

ghukill commented 7 years ago

Closing this as well - see https://github.com/ghukill/pyfc4/issues/78#issuecomment-325822009.