OCFL / spec

The Oxford Common File Layout (OCFL) specifications
https://ocfl.io
52 stars 14 forks source link

More compact inventory.json format #642

Open srerickson opened 11 months ago

srerickson commented 11 months ago

One issue with OCFL v1.x is that the inventory.json can get quite large-- especially when there are many versions, version files, and you add fixity to the mix. A more compact format is possible by removing duplication of digests and file paths. This is illustrated in the structure below, which is based on the spec-ex-full fixture. The original inventory is 3773 bytes, the alternative structure is 2217 bytes- almost half the size despite carrying the same information.

{
    "type": "https://ocfl.io/2.0-draft/spec/#inventory",
    "id": "ark:/12345/bcd987",
    "digestAlgorithm": "sha512",
    "head": "v3",
    "manifest": {
    "4d27c86b026ff709b02b05d126cfef7ec3aed5f83f5e98df7d7592f7a44bd1dc7f29509cff06b884158baa36a2bbeda11ab8a64b56585a70f5ce1fa96e26eb53": {
            "content": ["v2/content/foo/bar.xml"],
            "v1": [],
            "v2": ["foo/bar.xml"],
            "v3": ["foo/bar.xml"],
            "fixity":{
                "md5": "2673a7b11a70bc7ff960ad8127b4adeb",
                "sha1": "a6357c99ecc5752931e133227581e914968f3b9c"
            }
        },
    "7dcc352f96c56dc5b094b2492c2866afeb12136a78f0143431ae247d02f02497bbd733e0536d34ec9703eba14c6017ea9f5738322c1d43169f8c77785947ac31": {
            "content": ["v1/content/foo/bar.xml"],
            "v1": ["foo/bar.xml"],
            "v2": [],
            "v3": [],
            "fixity":{
                "md5": "184f84e28cbe75e050e9c25ea7f2e939",
                "sha1": "66709b068a2faead97113559db78ccd44712cbf2"
            }
        }, 
    "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e": {
            "content": ["v1/content/empty.txt"],
            "v1": ["empty.txt"],
            "v2": ["empty.txt","empty2.txt"],
            "v3": ["empty2.txt"],
            "fixity":{
                "md5": "d41d8cd98f00b204e9800998ecf8427e",
                "sha1": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
            }
        },
    "ffccf6baa21809716f31563fafb9f333c09c336bb7400088f17e4ff307f98fc9b14a577f92f3285913b7f53a6d5cf004503cf839aada1c885ac69336cbfb862e": {
            "content": ["v1/content/image.tiff"],
            "v1": ["image.tiff"],
            "v2": [],
            "v3": ["image.tiff"],
            "fixity":{
                "md5": "c289c8ccd4bab6e385f5afdd89b5bda2",
                "sha1": "b9c7ccc6154974288132b63c15db8d2750716b49"
            }
        }
    },
    "versions": {
        "v1": {
            "created": "2018-01-01T01:01:01Z",
            "message": "Initial import",
            "user": {
                "address": "mailto:alice@example.com",
                "name": "Alice"
            }
        },
        "v2": {
            "created": "2018-02-02T02:02:02Z",
            "message": "Fix bar.xml, remove image.tiff, add empty2.txt",
            "user": {
                "address": "mailto:bob@example.com",
                "name": "Bob"
            }
        },
        "v3": {
            "created": "2018-03-03T03:03:03Z",
            "message": "Reinstate image.tiff, delete empty.txt",
            "user": {
                "address": "mailto:cecilia@example.com",
                "name": "Cecilia"
            }
        }
    }
}

Edit:

Perhaps a cleaner structure for the manifest entries would be like:

{ 
  "content": ["v2/content/foo/bar.xml"],
  "state":{
    "v1": [],
    "v2": ["foo/bar.xml"],
    "v3": ["foo/bar.xml"]
  },
  "fixity":{
    "md5": "2673a7b11a70bc7ff960ad8127b4adeb",
    "sha1": "a6357c99ecc5752931e133227581e914968f3b9c"
  }
}