archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Terminus links are not marked as required in workflow-schema.json #1207

Open ross-spencer opened 4 years ago

ross-spencer commented 4 years ago

Please describe the problem you'd like to be solved

Developers writing custom workflows need to have tooling that helps them consider the impact of changes they are making to Archivematica. When adding microservices to workflow.json, or forward-porting changes from previous workflows, it must be clear what is an isn't changing/mandatory/required etc.

As Jorik points out in 1113 the terminus links (added here) are important. They free up the job queue to enable new work to happen. While the terminus "end" field was added, it was never marked as mandatory, i.e. not every microservice job has to reference it, and only those which use it are marked as true.

Describe the solution you'd like to see implemented

Mark the field as required in workflow-schema, and mark the non-terminal links as false. In future developers will be asked to make that conscious decision because the schema will fail validation otherwise.

Describe alternatives you've considered

Additional context

Noticed at a client site (EPP) porting custom microservices and usinga different merge strategy.


For Artefactual use:

Before you close this issue, you must check off the following:

jorikvankemenade commented 4 years ago

Good that you have create a separate issue for this. I am very much in favour of making the end field mandatory. The reason I didn't do this in my original PR was because I wasn't sure how much we wanted to change in the workflow definition. But given the work that already went in visualizing the terminal nodes, how much the workflow relies on it, and just to prevent any future pain I would say that they need to be made an integral part of the workflow language.

How often are end links added?

I agree with this, in the original PR my consideration was exactly this. But given that the scope of a workflow is quite relevant for the scheduling of the packages, I think this should be a very conscious decision.

We could continue to develop the documentation around workflow.json so that these links are described and then knowledge built that way.

Why not both? I think proper documentation of the workflow and definition is a very valuable addition. However, many developers tend not to read it (at first). So from that perspective I think the code/workflow.json should be opinionated to a point where the documentation is a nice-to-have but not necessary to not break things.