clamsproject / aapb-brandeis-datahousing

Apache License 2.0
0 stars 0 forks source link

Prototype for storage API #7

Open jyoune opened 3 months ago

jyoune commented 3 months ago

Basic skeleton for a prototype storage api for MMIF files that creates nested subdirectories based on the views and corresponding metadata present.

keighrim commented 3 months ago

For reference, this PR addresses the "storage" side of https://github.com/clamsproject/aapb-evaluations/issues/50 .

keighrim commented 2 months ago

A few additional suggestions after using it for uploading in recent days.

  1. zero-guid scenario and "rewind" feature: because of the rewind feature, I planned a regular "garbage collection" process to clean mmif files from non-terminal directories. However this needs to be more thought through since when the garbage collection is in place, the zero-guid query can return an empty directory.
  2. overwrite: what should happen when a payload for an upload request conflicts with an existing file?
    1. Can we just blindly overwrite?
    2. Can we just blindly reject the upload?
    3. Should we conduct some sort of "deep-diff" between two MMIF files and decide?
  3. document locations: we need to decide whether we want to allow users (uploaders) to use file:// scheme for document location (which is not persistent and possibly only available on the user's personal device), or only allow baapb:// locations for consistency and reproducibility.