Right now, it is possible for a Preservation action workflow to complete successfully, despite the fact that some Archivematica tasks may fail - for example, parsing a premis.xml file included in a PIP and writing it to the AIP METS file.
Because Archivematica outputs so many tasks, a single error can easily be overlooked in the long list (generally over 100) of tasks in an AIP creation workflow. There is also no indication at the Preservation action header level that any issues or failures can be found in the tasks below, making it easy for archivists to potentially miss critical information along the way while evaluating the success of an AIP creation workflow.
This is compounded by the fact that currently the Package Statuses legend visible as an expandable on the Packages browse page lists the definition of the "Done" status as:
The current workflow or task has completed without errors.
(emphasis on "without errors" added).
Combined, this means that while potentially business critical preservation tasks might have failed, a user may never notice this, and may continue to preserve content with undesirable errors in the AIP creation process.
We should clarify the wording of the Done status, and better highlight when there are non-critical errors in the tasks of a given workflow, so archivists can decide for themselves whether this invalidates the AIP or can be safely ignored.
To reproduce
Use enduro main branch no later than commmit 5763d35ebf1df28a5a827411509e39c6549aa5b1
Run a Vecteur SIP or AIP through
Inspect the results - check that the status is "Done"
Expand the preservation actions, scroll down and around task 33 or so, notice that the "Load PREMIS events from metadata/premis.xml" task has failed
Return to the Packages browse page, expand the Statuses legend, and read the definition of the "Done" status
Resulting error
There is a task error buried in the many tasks of the AIP creation workflow
It is easy to miss amid all the other tasks
There is no indication of this failure elsewhere on the Package details page
The Statuses legend defines the Done status as a workflow that completes "without errors", yet it has been applied to a workflow that included an error
Expected behavior
The "Done" status definition should be updated to remove the "without errors" condition
A task error should be easier to find amid many other successful tasks
An indication that one or more errors has occurred in a given workflow that otherwise completed successfully should be visible on the Package details page without needing to expand the relevant workflow tasks and find the specific error
Additional context and proposed resolution
I propose the following changes:
Revise the Done status definition
Add a new :warning: status that can be combined with other statuses to indicate that there is a task error, but the workflow continues (e.g. Done, In Progress, Pending)
When a task errors, highlight the entire task row in the preservation actions in red, similar to how Archivematica microservices appear when they error
Use the new :warning: status icon wherever relevant, i.e. wherever the Done status would normally be shown
On hover, show a count of task errors in the relevant workflow
Include a count of task errors in the summary information included below a Preservation action's header
Here are the proposed changes to the Status legend and the definitions. You can also see an example of the new :warning: status being included on an otherwise successful status in the table below the legend:
Here is a package details page, showing a Create AIP preservation action that includes an error:
Here is the same page, when hovering over the :warning: icon:
Description
Right now, it is possible for a Preservation action workflow to complete successfully, despite the fact that some Archivematica tasks may fail - for example, parsing a premis.xml file included in a PIP and writing it to the AIP METS file.
Because Archivematica outputs so many tasks, a single error can easily be overlooked in the long list (generally over 100) of tasks in an AIP creation workflow. There is also no indication at the Preservation action header level that any issues or failures can be found in the tasks below, making it easy for archivists to potentially miss critical information along the way while evaluating the success of an AIP creation workflow.
This is compounded by the fact that currently the Package Statuses legend visible as an expandable on the Packages browse page lists the definition of the "Done" status as:
(emphasis on "without errors" added).
Combined, this means that while potentially business critical preservation tasks might have failed, a user may never notice this, and may continue to preserve content with undesirable errors in the AIP creation process.
We should clarify the wording of the Done status, and better highlight when there are non-critical errors in the tasks of a given workflow, so archivists can decide for themselves whether this invalidates the AIP or can be safely ignored.
To reproduce
5763d35ebf1df28a5a827411509e39c6549aa5b1
Resulting error
Expected behavior
Additional context and proposed resolution
I propose the following changes:
Here are the proposed changes to the Status legend and the definitions. You can also see an example of the new :warning: status being included on an otherwise successful status in the table below the legend:
Here is a package details page, showing a Create AIP preservation action that includes an error:
Here is the same page, when hovering over the :warning: icon: