OCFL / spec

The Oxford Common File Layout (OCFL) specifications
https://ocfl.io
52 stars 14 forks source link

Support File Renaming in inventory.jsonld #16

Closed ahankinson closed 5 years ago

ahankinson commented 6 years ago

Currently there is an issue with inventory.jsonld file where it does not support file renaming. We need to resolve this.

Relates to ocfl/Use-Cases#26

julianmorley commented 6 years ago

OK, how about something like this - syntax simplified w/i extraneous values omitted and checksums abbreviated for legibility.

{
    "type": "Object",
    "head": "#v6",

    // For validation & object reconsitution, a scan of all version directories
    // MUST contain at least 1 file that matches every checksum here,
    // but we don't actually care what the filename is - just that the content
    // is present.

  "checksums": [ 
    "a83e3633",
    "bb123efc",
    "f4abe741",
    "ee983ac4"
    ]

   // Here we use forward diffs to construct the object history through
   // various versions. ADD, COPY, RENAME and DELETE actions are demonstrated.
   // The intention is that the most recent inventory file should be capable
   // of re-constituting the object to any prior version level.

  // It presumes that a scan of all the version directories has taken place,
  // and that at least one file that matches every checksum referenced above has been found.

  "versions": [
    {
    "type": "Version",
        "id": "#v1", // v1 initial add of 3 files
    "a83e3633": ["/file1"],
    "bb123efc": ["/file2"],
    "f4abe741": ["/file3"]
    },

    {
    "type": "Version",
        "id": "#v2", // v2 copy file2 to file4
    "bb123efc": ["/file2","/file4"],
    },

    {
    "type": "Version",
        "id": "#v3", // v3 rename file1 to file5
    "a83e3633": ["/file5"]
    },

    {
    "type": "Version",
        "id": "#v4", // v4 add file6
    "ee983ac4": ["/file6"]
    },

    {
    "type": "Version",
        "id": "#v5", // v5 delete file3
    "f4abe741": [""]
    },

    {
    "type": "Version",
        "id": "#v6", // v6 delete file4, rename file2 to file7
    "bb123efc": ["/file7"]
    },
  ]
}
awoods commented 6 years ago

Thanks, @julianmorley . This seems like a constructive path forward. :+1:

julianmorley commented 6 years ago

I've refined this a touch and created a gist: https://gist.github.com/julianmorley/9bc5d2ff525fbfc39d80e1fa3e2641a8

Main change is that I've renamed the version objects as deltas, and expressed version as a key/value attribute instead. This is to allow OCFL to support underlying preservation objects (or files) that don't have such a rigid notion of versioning (e.g. Bagit) that might still be revised over time.

bcail commented 6 years ago

@julianmorley in your #v2, where file2 is copied to file4, would this support storing the duplicate content only once on the filesystem? Or would you have to have the duplicate content stored twice under the different names?

julianmorley commented 6 years ago

@bcail It supports that, yes, but it doesn't enforce it - that depends on the underlying object ontology. Moab, for example, does native de-dupe files across versions. But if the underlying object was Bagit, for example, in two different version directories, all this would do is note that the exact same file shows up twice in the object, in two different locations.

If the tool used to create a new version of a Bagit object that conforms to OCFL is smart, then it could make the contents of v2 be only the changes made - relying on the contents of the Bagit manifest and the OCFL inventory to correctly re-hydrate an object. But that might be a bit of an edge case.

ahankinson commented 6 years ago

For reference, original gist:

https://gist.github.com/ahankinson/00796be6d2088fd6ace4ec5930692c6e

ahankinson commented 6 years ago

Propose: Add deltas to a Version and follow @julianmorley's proposal. This will help with deduplication and also to help track the changes that are made within a version.

Structure of deltas is TBD.

Also Propose reversing manifest and members to use paths as keys

ahankinson commented 6 years ago

@julianmorley will propose some wording to help make the deltas clearer.

rosy1280 commented 6 years ago

updated lazy gist https://gist.github.com/rosy1280/b6ebabdeb779a186d913a9ac1db886d5

zimeon commented 5 years ago

F2F decision: the adoption of a combination of manifest which maps digests to files in the OCFL Object and the state for each version which maps these digests to logical file paths comprising the complete logical state of the version supports arbitrary renaming. Closing.