OCFL / Use-Cases

A repository to help capture, track, and discuss use cases for OCFL. Issues-only, please.
7 stars 0 forks source link

Support segmented file storage #40

Closed zimeon closed 11 months ago

zimeon commented 3 years ago

Few filesystems and object stores work well with very large files (e.g. multi-terabyte) and the usual approach is to segment very large files into chunks for easier storage, transfer and fixity checking. Although one can store a set of segments in an OCFL v1 object, there is no support for understanding that a set of segments combine to make one logical file.

zimeon commented 3 years ago

It would be interesting if any solution that deals with the opposite case of many small files being packaged (#33) also supported this case of large files being segmented

julianmorley commented 3 years ago

This is getting dangerously close to adjusting the spec to guard against specific file system behaviors. We should strive to be filesystem agnostic, and note potential issues in the implementation notes (e.g. If you're using S3, don't have any single file larger than 5TB, rather than having OCFL mandate splitting of large files).

I can easily see someone using OCFL to track large disk images. If they were to push those objects to S3, they might wish to use the zip-per-version storage model (TBD) that would happily split large files into smaller zip segments, without dictating that they should just never store large files in OCFL.

That is to say, I think the OCFL fix for this should be "if your target file system doesn't support large files, use the zip-per-version storage model to segment them, or don't ingest large files".

julianmorley commented 11 months ago

Editors discussed this and we decided that OCFL should remain completely format and content agnostic, but if higher level tooling wishes this kind of functionality it can be provided by an extension.