Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.
Siegfried doesn't have absolute paths in its standard output. This means that one of the features of Demystify doesn't work or is simply misleading. The "files in containers" statistic currently relies on identifying files that are in objects prefixed with a zip-like file URI, e.g. gz://. We can't actually create these URIs reliably even though up until now I have tried.
What to do?
Create a flag in the database which includes this information.
Later on, see if Richard is game to add a flag to enable abspath to be returned for filename or a separate SF field.
More information
URI handling is done here. It needs to be refactored. It has worked fairly reliably though, so might be a useful reference point.
Siegfried doesn't have absolute paths in its standard output. This means that one of the features of Demystify doesn't work or is simply misleading. The "files in containers" statistic currently relies on identifying files that are in objects prefixed with a zip-like file URI, e.g.
gz://
. We can't actually create these URIs reliably even though up until now I have tried.What to do?
More information