OCFL / Use-Cases

A repository to help capture, track, and discuss use cases for OCFL. Issues-only, please.
7 stars 0 forks source link

Support notion of logical file that is stored in multiple parts in order to handle very large files #23

Closed zimeon closed 6 years ago

zimeon commented 6 years ago

Some institutions may have very large files that are inconvenient or impossible to store as single files within and OCFL digital object. It would always be possible to split files into multiple parts in a way that each part if treated as a first class file by OCFL, but that pushes the modeling/support burden onto the application. However, an OCFL model/convention for multipart files would allow the development of shared tooling to handle large files.

zimeon commented 6 years ago

At Cornell we are doing video digitization work that has so far created files up to 650GB, and we anticipate the possibility of larger files.

On current unix filesystems multi-TB files are supported although somewhat unwieldy. On AWS S3 there is support for individual files up to 5TB though transfer requires the use of multi-part upload in chunks <= 5GB (my experience from a few years ago suggests that chunks somewhat smaller than 5GB would likely be better for internet-scale transfers).

ahankinson commented 6 years ago

Is the issue here per-file filesystem limits?

In general I think multipart files might be in scope, but I would like to see an additional use case or details for them.

  1. An OCFL object can store any number of files within itself. If you have 10 5TB files (50TB) in a single object, that should be fine.
  2. If your filesystem is the limiting factor, perhaps the advice should be to use a different filesystem more suited to large file storage?
  3. Modelling of object is out-of-scope. If you have four individual video files making a full video, is this any different than 200 images that make up a book? In both those cases the 'object' modelling is found in the object metadata, not in the OCFL administrative metadata.
ahankinson commented 6 years ago

F2F 2018.09.05: Chunking is always possible, but OCFL will not specify any way of chunking files.