Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Conventions for predictable File URIs #248

Closed ruebot closed 8 years ago

ruebot commented 8 years ago

Issue by mjordan Thursday Feb 26, 2015 at 18:53 GMT Originally opened as https://github.com/islandora-interest-groups/Islandora-Fedora4-Interest-Group/issues/18


Title (Goal) Predict File URIs
Primary Actor Developer
Scope Code-level conventions
Level High
Story As a developer, I will need to be aware of conventions for identifying specific types of files that may be associated with a Fedora 4 object using an object's fcdm:hasFile property. These conventions are similar to using 'TN', 'OBJ', and other datastream IDs commonly used across solution packs in Isandora 7.x-1-x.

Examples:

Remarks:

ruebot commented 8 years ago

Comment by ksclarke Thursday Feb 26, 2015 at 19:25 GMT


I wonder if there is a tie-in here for persistent IDs (in whatever format you prefer: ARKs, DOIs, PURLs, etc.)? ARKs, for instance, have a way of specifying hierarchical files in the digital object represented by the ARK (using the ARK's Qualifier).

And I don't think the UUIDs are required for Fedora 4, but just what it uses out of the box? At one point there was a PID minter. The UUIDPathMinter was one option but you could choose to use another and it was configurable through something like:

https://github.com/fcrepo4/fcrepo4/blob/master/fcrepo-webapp/src/main/resources/spring/minter.xml

https://github.com/fcrepo4/fcrepo4/tree/master/fcrepo-mint/src/main/java/org/fcrepo/mint

But, I'm also not sure this wasn't removed in the code cleanup prior to the official release. I know there were issues with the path mapping of these IDs. It looks like it's still in master, but I'm not entirely sure of its status.

Persistent IDs have been on my Islandora wishlist for awhile so perhaps I'm lumping this in where it shouldn't be?

I always use UNT (not an Islandora site, but still) as the example of doing this right (in my opinion): http://digital.library.unt.edu/ark:/67531/metadc813/metadata/

ruebot commented 8 years ago

Comment by awoods Thursday Feb 26, 2015 at 20:39 GMT


@ksclarke, yes, the pluggable pid-minting is alive and well in the F4 codebase. The "out of the box" UUIDPathMinter is designed with performance in mind... but there are other options available, including a remote HttpPidMinter.

ruebot commented 8 years ago

Comment by daniel-dgi Thursday Feb 26, 2015 at 21:24 GMT


Is using a predicate for this type of thing too naive of an approach? I'd rather not mess with something that's gonna severely hurt performance just because we want semantics in the path.

ruebot commented 8 years ago

Comment by DiegoPino Thursday Feb 26, 2015 at 21:39 GMT


@daniel-dgi, if i understand correctly, the path we give a resource is not directly tied to on how F4 stores/fetches internally it's resources(remember reading about a Hierarchy translator, it's in the code, not sure if enabled?). If so, performance should not be a problem. So predicates/props could be a nice way, moreover if defined explicitly in an Islandora Ontology (love this part!) so developers can grab this definitions, classes(object) and subclasses (associated resources - old datastreams) to know where to search for a specific resource. 'ark:' is not possible, at least not out of the box/standard, '/' | ':' | '[' | ']' | '|' | '*', can't be part of a local name (rdf). Documentation says resource have also an identifier (additionally to the Path). How is this identifier used externally, or not used at all?

ruebot commented 8 years ago

Comment by awoods Thursday Feb 26, 2015 at 21:57 GMT


I definitely prefer the property/predicate approach in conjunction with something along the lines of the FCDM: https://wiki.duraspace.org/display/FF/Fedora+Community+Data+Model as opposed to semantically meaningful URLs.

ruebot commented 8 years ago

Comment by mjordan Thursday Feb 26, 2015 at 22:21 GMT


@daniel-dgi and @awoods, what would a typical REST conversation look like if the use case was "give me a copy of the file that has been designated as the thumbnail image for the object?"

ruebot commented 8 years ago

Comment by awoods Thursday Feb 26, 2015 at 23:17 GMT


@mjordan, In conjunction with a structuring along the lines of FCDM:

Ideally, the triples of your repository are indexed in an external triplestore (Fuseki, Sesame, etc). Then you simply make a SPARQL-Query such as:

select ?thumb where {
    <host/collections/{id}> fcdm:hasThumbnail ?thumb .
}

If, however, a REST interaction is required, here are some possibilities looking for an object's (container's) thumbnail:

  1. GET /collections/{id}/ \ Parse RDF looking for triple: </collections/{id}/> fcdm:hasThumbnail <URL-of-thumbnail>
  2. GET /URL-of-thumbnail

Alternatively, if more dynamic relationships are in play, the interaction may be more like:

  1. GET /collections/{id}/ \ Parse RDF looking for triples: </collections/{id}/> fcdm:hasRelatedFile <URL-of-file>
  2. For <URLs-of-files>, GET /URL-of-file parsing RDF for <URL-of-file> a fcdm:Thumbnail

But from a performance perspective, you probably want to hit Fedora as little as possible and instead take advantage of tooling that is optimized for this sort of thing, such as a proper triplestore.

ruebot commented 8 years ago

Comment by mjordan Friday Feb 27, 2015 at 15:07 GMT


@awoods Thanks, very helpful. I think the "documented agreement on which predicates/vocabularies are used in your model" is really the root of my original question though. Currently in Islandora there are several conventions (either implicit or explicit) that form this agreement - for example, I can't think of any content models that don't use the DSID 'TN' to identify a thumbnail, or any that don't use 'OCR' for the page-level text transcript of a paged document. If I could rephrase my user story, it would be "As a developer, I will need to be aware of an agreed-upon set of RDF predicates for specific types of files associated with an Object/Container that has a given content model."

ruebot commented 8 years ago

Comment by daniel-dgi Thursday Apr 16, 2015 at 13:43 GMT


@mjordan See https://github.com/Islandora-Labs/islandora/blob/7.x-2.x/docs/technical-documentation/services.md. Let me know what you think. Still WIP (we have a LOT of different datastrems/derivative types that need to be acounted for), but it's a start at fleshing all this out.

dannylamb commented 8 years ago

Closing old use cases until after MVP doc is released.