esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

Need documentation of OCFL implementation/subset #104

Closed dmer closed 4 weeks ago

dmer commented 4 years ago

What is the current setup/configuration for OCFL?

What are the plans for OCFL?

DiegoPino commented 4 years ago

@dmer thanks. Guess who is getting this assigned? ME! Will work on this during the weekend 😄

DiegoPino commented 3 years ago

@dmer i transferred the issue into the https://github.com/esmero/strawberryfield because this is the implementer of such need.

A few points to discuss here (or more like questions)

1.- How much does your team do know/have evaluated/have been part of the specs of OCFL? 2.- How do you envision using this and when.

I spend some days and weeks really working around the original 0.1-0.4 specs initially when Archipelago started (as an idea, means even 6 months before the first code was pushed) and even thought of it as a valuable choice before going for S3. As you know Archipelago is Fedora 4,5 and 6 less and i have outspoken my roadmap concerns about the need of a Fedora in our environment many times (better said the no need for).

The specs, in their 1.0 Draft version clash a little bit with two important architectural pillars of Archipelago:

@giancarlobi @alliomeria in case you wonder what this is. Please give the specs a quick look. If in rush, this is how the V1 of a given Object would look like.. Pretty similar to our attempts on https://github.com/frictionlessdata/datapackage-php so that could be a good way of reusing this work too.

[object root]
    ├── 0=ocfl_object_1.0
    ├── inventory.json
    ├── inventory.json.sha512
    └── v1
        ├── inventory.json
        ├── inventory.json.sha512
        └── content
            └── file.txt

But for what is worth, the implementation is simple!

Solutions:

There is a second option, which is just an external listener connected to our S3 (webhook or SNS) that takes every deposited thing on our own storage and puts it into a OCLF structure. But this is all offline and does not interfer with the normal, daily needs of other users. What i mean is the effort of duplicating storage and structure is removed from Drupal.

This are my initial thoughts but will explore the latest draft a bit more and come back with more concrete actions.

dmer commented 3 years ago

@DiegoPino Thanks for the detailed response! My opinions are based on a fairly naive understanding of the wants and needs of the digital preservation folks and a lot of anecdotal evidence from different repository owners.

Many repository owners have a need for some form of "preservation" repository as part of their digital repository. If Archipelago allowed the easy addition of an OCFL repository, that would definitely be a plus for these folks.

That being said, I wouldn't put this at the top of my priority list for the roadmap - the basic installation and operation functions seem more important right now.

I'm interested in the second option you describe - this was sort of what I was imagining - that one could "turn on" the OCFL repo and it would be created as a secondary/backup storage. This has the effect of providing a backup in a transferable and platform independent format. I can only imagine that this would also be less of an overhead/resource burden on the system.

Looking forward to discussing this further.

DiegoPino commented 3 years ago

@dmer thanks. I'm going back now into OCFL to see if any of the latest spec elements add complexity of the opposite, simpler approach to the initial explorations and coding we did when all started. Will keep you informed in this thread but i do not see this as a complex feature to implement, just one that needs right planning so it does not become a bottle neck. Thanks!

DiegoPino commented 4 weeks ago

Out of scope of our already working storage system. Also we have a Hook that allows custom implementations if needed. Closing as not part of our roadmap