Large Inventory.json files

OCFL / Use-Cases

A repository to help capture, track, and discuss use cases for OCFL. Issues-only, please.

7 stars 0 forks source link

Large Inventory.json files #48

Open awoods opened 10 months ago

awoods commented 10 months ago

As noted in https://github.com/OCFL/spec/issues/642, 'inventory.json' files can become large if the OCFL object has many versions or has many files or both. The result of this can be degradation of performance. The performance impact can be acute if the managing application relies on retrieving the inventory.json over the network (e.g. OCFL in S3). Additionally, parsing the inventory.json may also become a bottleneck.

Potential solutions to the issue of large inventory.json files are described in:

https://github.com/OCFL/spec/issues/642 (prospective) and
https://github.com/OCFL/Use-Cases/issues/46 (retrospective)

rosy1280 commented 7 months ago

Feedback on Use Cases

In advance of version 2 of the OCFL, we are soliciting feedback on use cases. Please feel free to add your thoughts on this use case via the comments.

Polling on Use Cases

In addition to reviewing comments, we are doing an informal poll for each use case that has been tagged as Proposed: In Scope for version 2. You can contribute to the poll for this use case by reacting to this comment. The following reactions are supported:

In favor of the use case	Against the use case	Neutral on the use case
👍🏼	👎🏼	👀

The poll will remain open through the end of February 2024.

julianmorley commented 7 months ago

My initial thought is that we shouldn't try to optimize the main inventory, but should encourage institutions that encounter a specific problem to implement an extension that pulls out the desired keys to a smaller sidecar file, or implement a queryable JSON datastore that their repository apps could use for reads.

neilsjefferies commented 6 months ago

Following editors discussion 29 Feb 24 this is out of scope for V2. The number of votes at the time of this comment is 0.

If you have a problem with version proliferation then the mutable head extension already exists. Collapsing versions is a possible solution covered by a separate use case as noted above. Compacting the inventory doesn't make enough size difference to significantly address the issue, at the expense of significantly increased parsing and updating complexity.