IIIF Storage Provider - Githubissues

digirati-co-uk / iiif-manifest-editor

Create new IIIF Manifests. Modify existing manifests. Tell stories with IIIF.

https://manifest-editor.digirati.services/

MIT License

31 stars 2 forks source link

IIIF Storage Provider #184

Open tomcrane opened 2 years ago

tomcrane commented 2 years ago

We think that the Canadiana use cases imply the ability to load and save manifests from Canadiana storage itself.

Canadiana manifests would not be hosted on the Preview service.

So we need to implement enough of a plugin for Canadiana to develop the other end.

Possibly a client of https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol

tomcrane commented 2 years ago

Canadiana notes:

1. Storage API

We would like to work together with the IIIF API specification working group to propose adding PUT and POST requests into the IIIF Presentation API.

This way, the solution can be used by many organisations.

We should define what inputs will be needed, and what the response format will be.

People will be free to choose their own storage solution in their IIIF API implementation (NoSQL database, S3...)

2. Auto Save

Ephemeral caching so that work is not lost.

3. Save As

Allow users to save back to a presentation API (see 1) and local disk as JSON

stephenwf commented 2 years ago

There are 4 storage concepts in the Manifest editor:

Project storage - Stores details on the current project, which manifest is loaded, settings, preview states etc.
Manifest storage - A rapid storage for in-progress manifests (auto-save, no ID generation)
Previews - A temporary destination for IIIF, push-only and not read-write
Publications - A final destination for IIIF, likely an external system. This is intended to be a push-only and not a read-write.

Here is the local storage for a Project:

{
  "id": "manifest-editor://project/75993761-323a-4731-bb56-8966f5ffe03b",
  "filename": "blank-manifest",
  "name": "Blank Manifest",
  "metadata": {
    "created": 1653661984028,
    "modified": 1653662026232
  },
  "resource": {
    "id": "/config/manifest-templates/blank.json",
    "type": "Manifest"
  },
  "publications": [],
  "previews": [],
  "settings": {},
  "storage": {
    "type": "manifest-storage",
     ...
  }
}

This allows for the manifest editor to potentially edit the same manifest in 2 configurations without conflicting. This might not be too useful at this exact moment, but when project storage moves to the "cloud" and shared with a team of users it will be more useful.

Although there are 4 layers of storage described here, they are split in a way that would allow you use the build-in storages and then only swap the layer you need to.

For example, for the Desktop testing there was a File-system Manifest storage adapter created: https://github.com/digirati-co-uk/iiif-manifest-editor/blob/feature/merged/manifest-editor/src/shell/ProjectContext/storage/FileSystemLoader.ts

This allows projects to be loaded and the IIIF saved onto the desktop in a specific folder. It doesn't require that the project information is also stored in that way. There is a separate adapter for the Project storage: https://github.com/digirati-co-uk/iiif-manifest-editor/blob/feature/merged/manifest-editor/src/shell/ProjectContext/backend/FileSystemFolderBackend.ts

This uses the "Home" folder and then a sub folder. By default it's ~/ManifestEditor/ and project data will be saved there. So you could open a Manifest from your "Documents" folder, it would save a project file in ~/ManifestEditor/projects/... and then save the IIIF back to the manifest in your documents.

tomcrane commented 2 years ago

Digirati Notes on Canadiana notes:

Notes on 1. Storage API

There is no formal technical specification work on a IIIF REST protocol. It has been mooted several times but has always been lower priority than other tech specs.

That said, the proposal at https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol is probably a good starting point, which we can work with Canadiana on for 1).

What we need:

An implementation of the REST read/write API described in the Wiki, as a first iteration
A mechanism for access control and management for permissions for reads and writes
An implementation of the storage adaptor (AbstractVaultLoader?, see above) that can publish to the REST API
An implementation of the storage adaptor for browsing IIIF collections and loading from the REST API

Once we have the last two of these, people can make their own IIIF REST impls behind the published spec, and not have to implement their own adaptors to get a file and folder model over their repsoitory/CMS/whatever. They can do that if they want but might just want to implement the REST layer as a facade and let the out of the box understanding of this in the ME do the hard UI work for them.

Notes on 2. Auto save

Auto save is already supported, to the preview endpoint. This could carry on being supported but the interface mentioned above could offer the option of accepting autosave pushes, so you could configure whether you want regular saves pushed to the preview service, the persistence service, or both. However this is probably not useful, or undesirable, for the REST back end

Ephemeral caching so that work is not lost. Manifest Editor uses local storage for this already.

Notes on 3. Save as

For downloading to local machine there is already the export functionality.

Consideration: if you Publish, that service might assign new ids to the resources in your manifest. It might well assign an id to the manifest itself. It's up to the Publish implementation to do that kind of thing (see the wiki page).

You wouldn't get this with the export functionality, which has no strategy implementations for assigning/minting IDs (and probably should not).

"Save as" back to the Publish endpoint would require assigning a new id. "Save as" implies that the user is free to name the "file" - which I think equates to, in the REST impl:

navigating IIIF collections to find where you want to put (or PUT) the manifest
possibly creating a new collection ("new Folder")
typing the last path element of the id as the file "name"

RussellMcOrmond commented 1 year ago

CRKN: We have different staff doing different parts of the editing, so may not be able to rely entirely on local storage on a given staff person's computer for the entire workflow before publicly publishing.

We envision having 3 instances offering the REST API.

A read-only service where manifests are automatically generated from an import of images from our OAIS packaging system (soon to be Archivematica, historically custom). This is how ID's for canvases to add to manifests are found.
A read-write service that is staff-only, used as a common staging area for multiple staff members who are working on different aspects of managing manifests and collections
A read-write services that is writable only by staff, but readable by the public for patron access (regular Presentation API).

We currently make use of a domain cookie for authorisation (a JWT in auth_token). I think it would be simple for the API to return 401, and the client display the message to the user to indicate what they need to do to log in. As this is a staff tool, we don't need to do anything complex like https://iiif.io/api/auth/1.0/ does for public/patron services.

We would like:

Mechanism to configure "sources" (read only) and "destinations" (read/write), so staff isn't needing to cut-and-past URLs to connect to REST services. This could be one set of "servers", and staff would remind themselves which they can only read from in the names they give them.
Mechanism to indicate "save as new" which uses POST rather than PUT in order to mint a new ID.
Mechanism to be able to change to a different storage destination, but where the ID is maintained, so a PUT would use the same ID.

Maybe this is what you are envisioning with being able to substitute the "Previews" and "Publications", but we also have additional sources (where staff only has read access to manifests/collections, used to find canvases to add to manifests) and destinations (where staff have write access). Maybe not per "project", but being able to configure these URLs once in the app and easily reference them in a project would be ideal. Maybe the additional "source" servers are only needed for when merging in from other sources #133 ?

For the server, I suspect us helping work on the reference implementation (We could use S3, via our Swift servers) would make more sense than us writing something custom to our current environment. We would then adapt our other tools to use that API, to bring data into any of our current custom systems (including what we use to provide indexes of the data for our existing patron-access websites).

We would need to ensure the API had some call to list ID's that have changed since some specific time (or some magic ID that was passed to the client via a previous poll. -- A simplified API call based on concepts we could borrow from https://docs.couchdb.org/en/3.2.0/api/database/changes.html - normal polling only, only "since" GET parameter, etc )

Is there value in taking any inspiration from https://github.com/textandbytes/iiif-manifest-store

brittnylapierre commented 1 year ago

For Notes on 3. and adding on a UI perspective from @RussellMcOrmond's previous comment

I think we should rename this to ‘Save to’ (we won't be able to let people rename the manifest, since the name will be an ID.)
We will have 2 environments we will want to save to – Staging and Production (I imagine a drop down letting users choose which environment to save to, 'staging' or 'production.' Environments could be configured in some sort of settings panel for the editor overall.
For each environment, support navigating IIIF collections to find where you want to put (or PUT) the manifest - looks good
Possibly creating a new collection ("new Folder") - looks good
For the name - show what the manifest id if it already exists, or ‘Save your new manifest here’ if it is a new manifest, and has no ID yet

Thank you!

tomcrane commented 1 year ago

There is a lot of work to do here, in specification, building the service, and in building the adaptors to allow the Manifest Editor to use the service.

It would need a formal reasonably up-front agreement, supported by experiments, prototypes and test environments. The wiki page, and the further discussion in this issue, are just starting points.

Some details

The protocol doesn’t require versioning in its simple form, but can have versioning layered on top where supported (e.g., S3, GitHub backends)
The simplest implementation runs on top of a file system. Next simplest maybe S3.
Atomic at the Manifest and Collection level. You can’t patch a canvas for example, other than by saving the manifest you've edited it in. This matches the units of "editability" of the ME.
Need to build support into the Manifest Editor
Protocol itself is not tied to a particular access control strategy, that's something else you layer on top. ME can send JWTs in our implementation.

Agree that an implementation of IIIF storage should optionally support endpoints to get lists of changes. The IIIF Change Discovery spec addresses this specific use case and already has several client implementation libraries (consider it as OAI-PMH for IIIF (ish)). https://iiif.io/api/discovery/1.0/ - so we should implement that.

RussellMcOrmond commented 1 year ago

Some thinking since September.

CRKN/Canadiana doesn't actually need to have a custom server. What we need to ensure is we have a way to do automation against the server API, where automation can fill in fields that staff wouldn't be expected to know (Other relevant identifiers, additional content such as auto-generated multi-page PDFs, etc), and extract IIIF data to be used in other aspects of our infrastructure. The discovery API would be an important part of this, so automation would know when to check records to see if updates are required (Polling is fine -- we already have processing that does polling to determine when changes happened at an earlier stage).

For the "Staff only" vs "Public" above, we can run two instances, or staff can save .json files on a shared drive for sharing incomplete work. CRKN will be trying to reduce unnecessary complexity and customizations. Auto-save doesn't need to be to a server, and local storage serves the job well.

Thank you.

tomcrane commented 1 year ago

(NB Russell's comment above relates to #202 and probably simplifies that issue, as described in my comment there)

Review (and update) https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol

RussellMcOrmond commented 1 year ago

When we are researching what exists, or if we need to create the reference implimentation, I want to suggest a feature from CouchDB be incorporated.

https://docs.couchdb.org/en/stable/replication/conflicts.html#conflict-avoidance

Each document has an ID and a Revision. If two tools (two instances of the manifest editor or some automation) try to update a document, the "second" one is informed that the revision ID they have is not current. The client tools can then deterministically decide what to do next, rather than the last one to save "winning".

It is something I find missing from many REST protocols, and is something I hope will be part of the storage provider API.

If we have to create our own storage provider, it may even be desirable to use CouchDB itself as part of the back-end to help manage that aspect. IIIF presentation documents could be stored as CouchDB attachments, or stored elsewhere, and have an associated CouchDB document for any control variables (such as this revision, dates of last update, etc -- for quick views to show most recently modified documents, etc).

https://hub.docker.com/_/couchdb

tomcrane commented 1 year ago

(from discussion 24 March) When the Manifest Editor loads a Manifest from the storage service, it knows it can save it back to the id. There isn't a separate storage URL distinct from the manifest's id.

How does it know that is can PUT back to that URL? The best mechanism would be HTTP OPTIONS, as it is idiomatic and also the client can present the same credentials. If for some reason that's not possible, it could be through config; e.g., if the id hostname is iiif-repo.crkn.ca. However the ME will still need to handle the HTTP interactions.

This leads to a new issue.

tomcrane commented 1 year ago

Summary of detailed discussions from Slack

Use of Containers

CRKN's use cases don't really need a deeply hierarchical storage solution, the stored manifests and collections are JSON blobs at URLs, and the manifest id is not path-based. In fact it's likely to be a NOID. So most of the time, opening is just from a given URL and saving is back to that URL. CRKN users won't be browsing around "folders" to pick a place to put the manifest.

Having said that, some notion of containers, even if it's just a few top level containers, would be useful to CRKN, for partitioning (rather than having multiple repository instances).

I have also introduced the idea of using ad hoc containers (collections) to improve the workflow for #43, to hold the extracted manifests.

Identity

The Manifest id is the GET / PUT location of the Manifest in the repository - there isn't a separate URI for persistence. That doesn't mean it goes out for public view with that id - it sill might be proxied, lightly transformed; the IIIF repository is not exposed to the internet. But such transforms are beyond the scope of the repository.

Conflict resolution

For MVP we can leave this up to the client to sort out. We use ETags to allow clients to keep track of what they are doing - they must have the ETag of the version they are attempting to overwrite. This is safe in that it prevents unwanted overwrites, but not necessarily helpful to the user, as they may be unable to resolve the situation easily. A post-MVP approach could adopt CouchDB's versioning model to allow reconciliation and choice. We should do that post-MVP I think. We should also give thought to how a GitHub back end might work - make sure we don't produce something at odds with that.

POST / PUT examples

A manifest with no id

POST to https://iiif-repo.example.org/library/
repo mints an id path element aabbccdd and makes the manifest available (internally anyway) at https://iiif-repo.example.org/library/aabbccdd

A manifest with the id https://iiif-repo.example.org/library/my-super-manifest

POST to https://iiif-repo.example.org/library/
repo saves the manifest as is and makes it available at at https://iiif-repo.example.org/library/my-super-manifest

A manifest with the id https://iiif-repo.example.org/library/my-other-manifest

PUT to https://iiif-repo.example.org/library/my-other-manifest
repo saves the manifest as is and makes it available at at https://iiif-repo.example.org/library/my-other-manifest

A manifest with the id https://iiif-repo.example.org/library/my-unhappy-manifest

PUT to https://iiif-repo.example.org/library/my-disgruntled-manifest
repo says NO - 400 bad request (probably) with error message

A manifest with the id https://iiif-repo.example.org/archives/my-super-manifest

POST to https://iiif-repo.example.org/library/
repo says NO - 400 bad request (probably) with error message

For these last two bad requests, the Manifest Editor would not allow you to make that POST (but the repository still needs to enforce the rules). The user interface of the manifest editor should allow the user to both modify the manifest URL and choose where to put it - these two things feed each other in the UI.

stephenwf commented 1 year ago

Manifest editor supporting features:

Related to "loading" resource from an external repository

Link/icon to open a IIIF Explorer in a new tab
- Configuration to customise the URL
Ability to customise the "Vault" instance for the "preview" Vault #233
- Allow for authed or cookie requests by passing fetch to this vault
Ability to "save" extra information when first fetching manifest that could be used later when publishing
- ETag
- Modified-At

Related to "publishing" to an external repository

Configuration of publish configuration #236
Implementation of a publish configuration that saves using the REST protocol
UI to take the user through the publish step(s) [design]
Configuration to decide to do after publishing (discard draft, re-import the published manifest, refetch original source, nothing)

stephenwf commented 1 year ago

The discussions so far have been configuring the Manifest Editor so that it can communicate with an external service (inside out). I've also experimented with wrapping the Manifest Editor and moving the storage, loading and publishing to chrome outside of the shell/ME. This has worked well and is another option we could explore.