RepoCamp / asc2018

Advanced Samvera Camp code samples and wiki
Apache License 2.0
2 stars 2 forks source link

Create actor and job(s) to split a multi-page upload, create child work per-page #31

Open seanupton opened 6 years ago

seanupton commented 6 years ago

Use case: user uploads 5 page PDF to a master work, wants each page to be found and searchable, and therefore wants child works (members of master/parent work). Goal is to have an actor that intercedes after creation of the master work to use the multi-page PDF to this end.

Assumptions

  1. Derivative creation, and full text extraction can be expensive, so as much as possible of the process for dealing with these steps should be queued and processed as job(s).
  2. There are no special work types, just a single work type that can handle either single page or multi-page.
  3. Works may be scanned or digitally produced, only assumption is that some things uploaded are multi-page, and some things may be single page. If something uploaded is a single-page file (e.g. a TIFF, JP2, or a single-page PDF), child works should not be created.
  4. The actor stack is the appropriate place to intervene/hook into the work creation process for the multi-page work, but actors may need to queue jobs to do most of the child work creation. This may be complicated by the means by which Hyrax also queues creating File Sets asynchronously, so bypassing a need to access an upload that may not yet be stored in a file set seems reasonably safe, and possibly necessary.