UX design for new texts ingestion pages

This is the opportunity to do proper UX for the whole ingestion process for new materials to enter the database.

Status quo:

a single DOCX file is uploaded, with (too) basic metadata inputted: a single(!) author, a single optional translator, a genre, a title, published edition details (publisher, city, year)
the DOCX is automatically converted to MultiMarkDown text
the editor fixes up the markdown, tidying headings and sub-headings, prose/poetry text formatting, etc.
in the case of multiple works in a single DOCX file (e.g. multiple separate poems, or articles, that were typed in sequence in a single task, resulting in a single DOCX file), the editor can place special custom PBY markup (&&&) to mark a separation and a specific title for that work
when the editor is happy with the markdown as shown in the preview, they press the "create entities" button, and the system creates catalog entities (Work, Expression, Manifestation), linking them to the specified author and translator.
Nothing is added or changed in the respective Hebrew author's/translator's own pages (their "TOC"s). That needs to be done manually by the editor. (A non-Hebrew author has an auto-generated TOC that will be generated with the newly-added works next time, but in no guaranteed order, so this is not good enough.)

This is what the upload form looks like:

Brave new future

the upload form would allow specifying any number of authors/translators
the new texts would necessarily be made part of a Collection:
- they can be added to an existing Collection (either one of the ones associated with one of the specified connected people, or any arbitrary existing collection from the database, that the editor can search for and select), including the author's default ("TOC") collection.
- they can be created as their own Collection (e.g. when uploading a volume -- a whole book of poetry, or a single novel), but that Collection still needs to be made part of a broader Collection, potentially the author's default ("TOC") collection.
- in the case of multiple invididual works, each work can specify (using more custom markup? dynamic fields?) authors, genre, etc., and can belong to the overall Collection selected or to a new sub-Collection (e.g. a sonnet cycles within a book of poetry), with some means of defining which of the multiple works being uploaded belongs to which collection.
- after the "create entities" button is pushed, and the database entities for the works and collections are updated, there needs to be an opportunity to adjust the placement of the new entities within their containing Collections. For example, if I've uploaded a new volume of poetry, I should be able to adjust the placement of the whole volume within the main ("TOC") Collection of that author's works. If I've added a few uncollected articles by a certain author to their main ("TOC") Collection (i.e. there's no 'volume' to group them under), I should be able to adjust the placement of those new articles within the TOC (e.g. by date, or alphabetically, or under some sub-heading (=sub-Collection). By default, all new Collection items are added at the end of the Collection's sequence.
- In the case of an awkward mistake (e.g. wrong metadata now present in dozens of new DB entities), I should be able to undo entity creation (destroying the DB entities and removing them from the Collections they were added to), change the status of the original ingestion entity (for historical reasons called HtmlFile) back to "in progress", fix the error in the ingestion pipeline, then re-create the entities and re-add them to Collections as we would in a new ingestion.

Related work

see UX for batch editing

abartov / bybeconv

UX design for new texts ingestion pages #268

Status quo:

Brave new future

Related work