Open peterVG opened 4 years ago
@peterVG this is starting to take good shape.
As you noted in Slack, then using the PR functionality (you can create a WIP/Draft draft now in Github) will be good to do more detailed revision.
Some thoughts that I hope will help until our meeting Monday:
Examples
In the context and problem statement, I wonder if you can break the examples of reporting out into categories with fewer specific examples, e.g. Repository maintenance reporting (might describe how many packages, how many deletions, etc.); File format reporting (might describe no.s formats, no.s of significant properties).
I think that would then feed into the considered options as to what data is in, and which data is out, of scope in this ADR. (My feeling is that we won't be able to tackle it all).
Exhausting our data sources
I think it might start to look over-whelming but I think we can add to the data sources in Archivematica. I think I'd like to exhaust them here, at least for discussion.
I was thinking we'd at least need to add:
I was trying to think of others. I flip-flop between the API a lot. There is definitely information to be extracted from there which can be a different rendering to the database. It also might not be the same API as one we might create for this work?
I like that you've noted we might need to generate information. It's a good question, if we generate it in Archivematica where do we keep it? (Enhance the METS? Other DB tables?) Is Archivematica already saturated with regards to new information?
I wonder then if the Technical forces section could then start to be split into:
There may be other sections after we chat. I think this will then help draw out more decisions we want to make.
Emphasizing the use of this data
And just the last thing, but it would be good to keep in sight where this data ends up. And I think there may be plenty of places - mgmt reports, etc. but for Archivematica, keeping in mind that it might then consume its own reports somehow to drive re-ingest, or PAR-like actions, will be good to do do in this ADR.
One potential impact say (hypothetically), is that, we might write something extracts the data, provides nice reports, and on top of that nice visualizations. But we might also keep in mind that that thing we write, we might also write an API so that it can then be worked back into Archivematica (or indeed visualization tools). Certainly, we'll write some form of interface that we can cleanly work with and abstract from.
There is pent up demand from Archivematica users to introduce reporting functionality to Archivematica. They want statistics about what their Archivematica deployments are doing and when, as well as detailed breakdowns of the content in their Archivematica systems. While the existing Archival Storage search and hit display does provide some useful information, it does not aggregate this information or present it in management-style reports. Work on a comprehensive reporting feature has been delayed because it hasn't been clear where the canonical source of Archivematica statistical and content information is stored or which of these sources is the most convenient, comprehensive, and performant source for building reports. There is also some mixup between logging and reporting functionality. All of this is complicated by the fact that production Archivematica deployments are often split over multiple processing pipelines. This ADR should address these problems and provide options for moving forward with a solution(s).