digirati-co-uk / iiif-manifest-editor

Create new IIIF Manifests. Modify existing manifests. Tell stories with IIIF.
https://manifest-editor.digirati.services/
MIT License
31 stars 2 forks source link

IIIF Storage Provider #184

Open tomcrane opened 2 years ago

tomcrane commented 2 years ago

We think that the Canadiana use cases imply the ability to load and save manifests from Canadiana storage itself.

Canadiana manifests would not be hosted on the Preview service.

So we need to implement enough of a plugin for Canadiana to develop the other end.

Possibly a client of https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol

tomcrane commented 2 years ago

Canadiana notes:

1. Storage API

We would like to work together with the IIIF API specification working group to propose adding PUT and POST requests into the IIIF Presentation API.

This way, the solution can be used by many organisations.

We should define what inputs will be needed, and what the response format will be.

People will be free to choose their own storage solution in their IIIF API implementation (NoSQL database, S3...)

2. Auto Save

Ephemeral caching so that work is not lost.

3. Save As

Allow users to save back to a presentation API (see 1) and local disk as JSON

stephenwf commented 2 years ago

There are 4 storage concepts in the Manifest editor:

Here is the local storage for a Project:

{
  "id": "manifest-editor://project/75993761-323a-4731-bb56-8966f5ffe03b",
  "filename": "blank-manifest",
  "name": "Blank Manifest",
  "metadata": {
    "created": 1653661984028,
    "modified": 1653662026232
  },
  "resource": {
    "id": "/config/manifest-templates/blank.json",
    "type": "Manifest"
  },
  "publications": [],
  "previews": [],
  "settings": {},
  "storage": {
    "type": "manifest-storage",
     ...
  }
}

This allows for the manifest editor to potentially edit the same manifest in 2 configurations without conflicting. This might not be too useful at this exact moment, but when project storage moves to the "cloud" and shared with a team of users it will be more useful.

Although there are 4 layers of storage described here, they are split in a way that would allow you use the build-in storages and then only swap the layer you need to.

For example, for the Desktop testing there was a File-system Manifest storage adapter created: https://github.com/digirati-co-uk/iiif-manifest-editor/blob/feature/merged/manifest-editor/src/shell/ProjectContext/storage/FileSystemLoader.ts

This allows projects to be loaded and the IIIF saved onto the desktop in a specific folder. It doesn't require that the project information is also stored in that way. There is a separate adapter for the Project storage: https://github.com/digirati-co-uk/iiif-manifest-editor/blob/feature/merged/manifest-editor/src/shell/ProjectContext/backend/FileSystemFolderBackend.ts

This uses the "Home" folder and then a sub folder. By default it's ~/ManifestEditor/ and project data will be saved there. So you could open a Manifest from your "Documents" folder, it would save a project file in ~/ManifestEditor/projects/... and then save the IIIF back to the manifest in your documents.

tomcrane commented 2 years ago

Digirati Notes on Canadiana notes:

Notes on 1. Storage API

There is no formal technical specification work on a IIIF REST protocol. It has been mooted several times but has always been lower priority than other tech specs.

That said, the proposal at https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol is probably a good starting point, which we can work with Canadiana on for 1).

What we need:

Once we have the last two of these, people can make their own IIIF REST impls behind the published spec, and not have to implement their own adaptors to get a file and folder model over their repsoitory/CMS/whatever. They can do that if they want but might just want to implement the REST layer as a facade and let the out of the box understanding of this in the ME do the hard UI work for them.

Notes on 2. Auto save

Auto save is already supported, to the preview endpoint. This could carry on being supported but the interface mentioned above could offer the option of accepting autosave pushes, so you could configure whether you want regular saves pushed to the preview service, the persistence service, or both. However this is probably not useful, or undesirable, for the REST back end

Ephemeral caching so that work is not lost. Manifest Editor uses local storage for this already.

Notes on 3. Save as

For downloading to local machine there is already the export functionality.

Consideration: if you Publish, that service might assign new ids to the resources in your manifest. It might well assign an id to the manifest itself. It's up to the Publish implementation to do that kind of thing (see the wiki page).

You wouldn't get this with the export functionality, which has no strategy implementations for assigning/minting IDs (and probably should not).

"Save as" back to the Publish endpoint would require assigning a new id. "Save as" implies that the user is free to name the "file" - which I think equates to, in the REST impl:

RussellMcOrmond commented 1 year ago

CRKN: We have different staff doing different parts of the editing, so may not be able to rely entirely on local storage on a given staff person's computer for the entire workflow before publicly publishing.

We envision having 3 instances offering the REST API.

  1. A read-only service where manifests are automatically generated from an import of images from our OAIS packaging system (soon to be Archivematica, historically custom). This is how ID's for canvases to add to manifests are found.
  2. A read-write service that is staff-only, used as a common staging area for multiple staff members who are working on different aspects of managing manifests and collections
  3. A read-write services that is writable only by staff, but readable by the public for patron access (regular Presentation API).

We currently make use of a domain cookie for authorisation (a JWT in auth_token). I think it would be simple for the API to return 401, and the client display the message to the user to indicate what they need to do to log in. As this is a staff tool, we don't need to do anything complex like https://iiif.io/api/auth/1.0/ does for public/patron services.

We would like:

Maybe this is what you are envisioning with being able to substitute the "Previews" and "Publications", but we also have additional sources (where staff only has read access to manifests/collections, used to find canvases to add to manifests) and destinations (where staff have write access). Maybe not per "project", but being able to configure these URLs once in the app and easily reference them in a project would be ideal. Maybe the additional "source" servers are only needed for when merging in from other sources #133 ?

For the server, I suspect us helping work on the reference implementation (We could use S3, via our Swift servers) would make more sense than us writing something custom to our current environment. We would then adapt our other tools to use that API, to bring data into any of our current custom systems (including what we use to provide indexes of the data for our existing patron-access websites).

Is there value in taking any inspiration from https://github.com/textandbytes/iiif-manifest-store

brittnylapierre commented 1 year ago

For Notes on 3. and adding on a UI perspective from @RussellMcOrmond's previous comment

Thank you!

tomcrane commented 1 year ago

There is a lot of work to do here, in specification, building the service, and in building the adaptors to allow the Manifest Editor to use the service.

It would need a formal reasonably up-front agreement, supported by experiments, prototypes and test environments. The wiki page, and the further discussion in this issue, are just starting points.

Some details

Agree that an implementation of IIIF storage should optionally support endpoints to get lists of changes. The IIIF Change Discovery spec addresses this specific use case and already has several client implementation libraries (consider it as OAI-PMH for IIIF (ish)). https://iiif.io/api/discovery/1.0/ - so we should implement that.

RussellMcOrmond commented 1 year ago

Some thinking since September.

CRKN/Canadiana doesn't actually need to have a custom server. What we need to ensure is we have a way to do automation against the server API, where automation can fill in fields that staff wouldn't be expected to know (Other relevant identifiers, additional content such as auto-generated multi-page PDFs, etc), and extract IIIF data to be used in other aspects of our infrastructure. The discovery API would be an important part of this, so automation would know when to check records to see if updates are required (Polling is fine -- we already have processing that does polling to determine when changes happened at an earlier stage).

For the "Staff only" vs "Public" above, we can run two instances, or staff can save .json files on a shared drive for sharing incomplete work. CRKN will be trying to reduce unnecessary complexity and customizations. Auto-save doesn't need to be to a server, and local storage serves the job well.

Thank you.

tomcrane commented 1 year ago

(NB Russell's comment above relates to #202 and probably simplifies that issue, as described in my comment there)

Review (and update) https://github.com/digirati-co-uk/iiif-manifest-editor/wiki/REST-Protocol

RussellMcOrmond commented 1 year ago

When we are researching what exists, or if we need to create the reference implimentation, I want to suggest a feature from CouchDB be incorporated.

https://docs.couchdb.org/en/stable/replication/conflicts.html#conflict-avoidance

Each document has an ID and a Revision. If two tools (two instances of the manifest editor or some automation) try to update a document, the "second" one is informed that the revision ID they have is not current. The client tools can then deterministically decide what to do next, rather than the last one to save "winning".

It is something I find missing from many REST protocols, and is something I hope will be part of the storage provider API.

If we have to create our own storage provider, it may even be desirable to use CouchDB itself as part of the back-end to help manage that aspect. IIIF presentation documents could be stored as CouchDB attachments, or stored elsewhere, and have an associated CouchDB document for any control variables (such as this revision, dates of last update, etc -- for quick views to show most recently modified documents, etc).

https://hub.docker.com/_/couchdb

tomcrane commented 1 year ago

(from discussion 24 March) When the Manifest Editor loads a Manifest from the storage service, it knows it can save it back to the id. There isn't a separate storage URL distinct from the manifest's id.

How does it know that is can PUT back to that URL? The best mechanism would be HTTP OPTIONS, as it is idiomatic and also the client can present the same credentials. If for some reason that's not possible, it could be through config; e.g., if the id hostname is iiif-repo.crkn.ca. However the ME will still need to handle the HTTP interactions.

This leads to a new issue.

tomcrane commented 1 year ago

Summary of detailed discussions from Slack

Use of Containers

CRKN's use cases don't really need a deeply hierarchical storage solution, the stored manifests and collections are JSON blobs at URLs, and the manifest id is not path-based. In fact it's likely to be a NOID. So most of the time, opening is just from a given URL and saving is back to that URL. CRKN users won't be browsing around "folders" to pick a place to put the manifest.

Having said that, some notion of containers, even if it's just a few top level containers, would be useful to CRKN, for partitioning (rather than having multiple repository instances).

I have also introduced the idea of using ad hoc containers (collections) to improve the workflow for #43, to hold the extracted manifests.

Identity

The Manifest id is the GET / PUT location of the Manifest in the repository - there isn't a separate URI for persistence. That doesn't mean it goes out for public view with that id - it sill might be proxied, lightly transformed; the IIIF repository is not exposed to the internet. But such transforms are beyond the scope of the repository.

Conflict resolution

For MVP we can leave this up to the client to sort out. We use ETags to allow clients to keep track of what they are doing - they must have the ETag of the version they are attempting to overwrite. This is safe in that it prevents unwanted overwrites, but not necessarily helpful to the user, as they may be unable to resolve the situation easily. A post-MVP approach could adopt CouchDB's versioning model to allow reconciliation and choice. We should do that post-MVP I think. We should also give thought to how a GitHub back end might work - make sure we don't produce something at odds with that.

POST / PUT examples

A manifest with no id

A manifest with the id https://iiif-repo.example.org/library/my-super-manifest

A manifest with the id https://iiif-repo.example.org/library/my-other-manifest

A manifest with the id https://iiif-repo.example.org/library/my-unhappy-manifest

A manifest with the id https://iiif-repo.example.org/archives/my-super-manifest

For these last two bad requests, the Manifest Editor would not allow you to make that POST (but the repository still needs to enforce the rules). The user interface of the manifest editor should allow the user to both modify the manifest URL and choose where to put it - these two things feed each other in the UI.

stephenwf commented 1 year ago

Manifest editor supporting features:

Related to "loading" resource from an external repository

Related to "publishing" to an external repository

stephenwf commented 1 year ago

The discussions so far have been configuring the Manifest Editor so that it can communicate with an external service (inside out). I've also experimented with wrapping the Manifest Editor and moving the storage, loading and publishing to chrome outside of the shell/ME. This has worked well and is another option we could explore.