Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.
Classification is now being exported by Siegfried, and soon by DROID. We need to capture this in the report here.
SQL:
Example result (OPF format corpus):
This might be a good opportunity to rework the queries in
AnalysisQueriesClass.py
again and creating some helpers to return those.I think this change will up the version from whenever it is implemented.